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The Relative Susceptibility of Two Rating Scales to 
Disturbances Resulting from Shifts in 
Stimulus Context * 


Donald T. Campbell, William A. Hunt, and Nan A. Lewis 


Northwestern University 


Several effects typically found in psycho- 
physical judgments with the method of single 
stimuli have been demonstrated by the au- 
thors in ratings of the degree of schizophrenic 
disturbance evidenced in responses to vocabu- 
lary items (2). For example, strongly skewed 
contexts produce a contrast effect in judg- 
ments of middle items, while sharp shifts in 
context produce a loss of discrimination. 
While such effects per se have practical value 
for the understanding and refinement of clini- 
cal judgment, they also may be used as cri- 
teria for the comparative evaluation of rat- 
ing scales, since they represent distortions of 
judgment to be avoided if possible. 

The present study compares the suscepti- 
bility to such distortions of two types of rat- 
ing scales—one a simple, numerical, nine- 
point scale called the “Simple,” the other, the 
“Detailed,” a nine-point scale on which each 
numerical point is provided with a verbal de- 
scription. 


Method 


The stimuli were vocabulary responses selected pri- 
marily from those scaled by Arnhoff (1), extended 
by the addition of some normal responses as previ- 
ously reported (2). Sample items with their values 
on a nine-point scale follow: 


. Gamble: to take a chance, a risk. 

. Fur: all of an animal’s coat. 

. Rim: outside diameter with a margin. 

. Envelope: something you put it in for them. 

1 This study is part of a larger project subsidized 
by the Office of Naval Research under contract 7onr- 
450(11) with Northwestern University. The opin- 
ions expressed here are those of the individual au- 
thors and do not represent the opinions or policy of 
the Naval service. 


9. Stave: that’s before, that’s long before not hap- 
piness. 


The stimuli were presented in booklet form with 
five items to a page, as indicated in Table 1. For 
the “Low-High” condition, each of the first 10 pages 
constituting the initial phase contained five items 
representing scale values from 1 to 5. The order of 
scale values was different for each page as deter- 
mined by a pair of balanced latin squares. The next 
six pages provided a gradual shift to higher scale 
values, with the last 10 pages each containing stimuli 
of Values 5 to 9. The “High-Low” condition used 
the same 26 pages in reverse order. Items of scale 
Value 5 are thus present in both low and high con- 
texts and provide the common denominator for 
evaluating shifts or disturbances in judgment. Full 
details of the counterbalanced design employed and 
the guarantees of equivalence of items of Value 5 in 
both low and high contexts are found in the previ 
ous study (2). Suffice it to say that two versions of 
each page were prepared for the initial and terminal 
phase, differing in the specific item of Value 5 em- 
ployed and so counterbalanced that while no judge 
rated the same item twice, the same specific items 
were judged under all conditions; and that within 
the initial and terminal phases, 10 different orders of 
page assembly were employed, following another 
balanced latin square. In all, 80 different types of 
booklet were employed, two of each being used. 

For the Simple form, the instructions to the judges 
were as follows: 


On the pages that follow you will be shown defi- 
nitions of vocabulary words made by both nor- 
mal and schizophrenic individuals. Your task is 
to rate each one of these definitions according to 
the degree of organization and eccentricity which 
you think is present. You are to make your rat- 
ings on a nine (9) point scale which ranges from 
“well-organized and normal” to “totally disorgan- 
ized and eccentric.” The category one (1) should 
represent the most organized and normal defini- 
tions. The category nine (9) should represent the 
maximal amount of disorganization and eccen- 
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tricity as found in schizophrenic thinking. The 
intermediate categories should indicate amounts in 
accordance with their numerical value. You will 
simply write the number which you think best 
depicts each definition at the beginning of each 
statement. Be sure that you do not skip any defi- 
nition, and rate each one as you come to it. 


It is very important that you do not rate defini- 
tions according to how intelligent you think the 
person was who made the statement. Hence, even 
though a definition is incorrect, if it is in no way 
eccentric or disorganized, it should be rated to- 
ward the low end of the scale. On the other hand, 
if it shows signs of disorganization or eccentricity, 
even though fairly accurate, it should be rated to- 
ward the high end of the scale. Try to be as dis- 
criminating as possible. 


For the Detailed form, where each scale point was 
verbally defined, the instructions to the Ss were as 


follows: 


On the pages that follow, you will be shown defi- 
nitions of vocabulary words made by both normal 


ward the high end of the scale. 
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and schizophrenic individuals. Your task is to 
rate each one of these definitions according to the 
degree of organization and eccentricity which you 
think is present. The scale provided below is to 
be used in making your ratings. You will simply 
write the number which you think best depicts 
each definition at the beginning of each statement. 
Be sure that you do not skip any definition, and 
rate each one as you come to it. 


It is very important that you do not rate defini- 
tions according to how intelligent you think the 
person was who made the statement. Hence, even 
though a definition is incorrect, if it is in no way 
eccentric or disorganized, it should be rated to- 
ward the low end of the scale. On the other hand, 
if it shows signs of disorganization or eccentricity, 
even though fairly accurate, it should be rated to- 
Try to be as dis- 
criminating as possible. 


1. A very normal, well-organized definition. 

2. A fairly normal, well-organized definition. 

3. Very slight traces of disorganization and ec- 
centricity are present. 


Table 1 
Design of the Experimental Booklets 
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Two Rating Scales 


Table 2 
Mean Ratings of the Item of Value 5 on the Last Page of the Initial Phase 


High Context 
(High-Low Group) 





Mean Ci 
3.40 5.88 
3.77 4.78 


. Distinct traces of disorganization and eccen- 


tricity are shown. 

. Obvious signs of disorganization and _ eccen- 
tricity are present. 

. Very eccentric and disorganized, but still show- 
ing signs of contact with reality. 

. Very eccentric and disorganized, showing only 
a thin thread of coherence. 

. Extremely disorganized and eccentric. 

. Totally disorganized, eccentric, and out of con- 
tact with reality. 


The Ss were 160 undergraduate students in an in- 
troductory psychology course at Northwestern Uni- 
versity, allowing 40 Ss for each of the four condi- 
tions provided for by using each of the two sets of 
instructions with both Low-High and High-Low con- 
text changes. Sampling equivalence for the groups 
was achieved by randomizing the booklets before 
distribution to the Ss. 


Results 


Differences between groups exposed to dif- 
ferent stimulus contexts. We can examine the 
biasing effects of stimulus context by compar- 
ing ratings of our middle value stimuli as 
given by the High-Low and Low-High groups 
at the end of the initial context (p. 10 in the 
booklet of stimuli). We should expect a con- 
trast effect, with the middle value items be- 
ing rated as less disturbed when they appear 
in a context of highly disturbed responses 
(High-Low group) and more disturbed when 
presented in a context of less disturbed re- 
sponses (Low-High group). As Table 2 
shows, the contrast effect appears clearly for 
both ratings forms. However, it is much less 
marked for the Detailed form. Since the 
number of cases involved in each comparison 
is equal, we can make a direct comparison 
between ¢ ratios. The significance of the dif- 
ference between these ¢ ratios is in itself 
highly significant (¢ = 3.14, p < .002), con- 


(N = 40 for Each Mean) 


Low Context 
(Low-High Group) 


Mean 


6.80 
4.95 


firming the greater susceptibility of the Sim- 
ple form to distortions produced by stimulus 
context. As such distortions of judgment are 
undesirable, we may call the Detailed rating 
form superior in this regard. Note that apart 
from response to distorting context, the two 
scales would probably produce different means 
and standard deviations. The comparison of 
t ratios enables us to compare the degree of 
distortion independent of these differences. 
Shifts in judgment produced by reversal of 
Stimulus context. Here we are interested in 
the changes produced by reversing the con- 
text from low to high or high to low as the 
experiment proceeds. This involves a com- 
parison of the ratings of the “5” items in the 
initial context (pp. 1-10 of the booklet) with 
those in the final context (pp. 17-26). Where 
our previous analysis was in terms of groups, 
we will now analyze our data in terms of indi- 
vidual Ss. This is necessitated by the fact 
that whereas contrast phenomena are strongly 
manifested in the comparison of initial con- 
texts, reversing the context in a single experi- 
mental session has a mixed effect, producing 
contrast effects in some Ss, but assimilation 
effects in others (2). Thus some Ss showed 
the expected effect of contrast, and when the 
general context of items moved from low to 
high, for example, their judgments of the 
“5’s” became lower than they had been. But 
for the assimilators, the shift of context from 
low to high resulted in rating the “5’s” higher 
than they had been rated before. Thus for 
some Ss, the value of the middle items 
changed toward the new context rather than 
away from it as contrast would demand. Since 
“assimilators” and “contrastors” move in op- 
posite directions, they tend to cancel one an- 
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other in any group treatment and the data 
must be analyzed in terms of changes in judg- 
ment for single individuals. 

To make this evaluation uf individual shifts 
maximally independent of the judgmental 
idiosyncracies of individual judges we com- 
puted separately for each judge a t ratio be- 
tween his judgments of the ten “5” items in 
the initial context (pp. 1-10) and the ten 
“5’s” of the terminal context (pp. 17-26). 
Each of these ¢ ratios was then used as a 
score for an individual judge, representing his 
inconstancy in the face of the shifting context. 
Since the shifts take place in two directions, 
we have used minus signs to designate assimi- 
lation errors and plus signs to indicate con- 
trast. For the Simple rating form the ?¢’s 
ranged from — 7.22 to + 7.04, and for the 
Detailed form from — 8.96 to + 6.50. These 
t’s show surprisingly large values considering 
that they are based upon an N of only 20 
items. 

To analyze the data for degree of shift, dis- 
regarding its direction, we have compared the 
two rating forms in terms of the magnitude of 
the ¢ ratios, disregarding sign. There are 
slight trends in both the Low-High and High- 
Low groups for the smaller #’s to be found 
with judges using the Simple form. These 
trends do not reach significance, however, and 
when Low-High and High-Low groups are 
pooled, the ¢ ratio of the Simple and Detailed 
form comparison (using the individual ¢ ratios 
as scores) is 1.30, giving a p value less than 
.19. Thus, while shifts in judgment certainly 
occurred as a result of shifting the stimulus 
context during the experiment, the two rating 
forms show no significant differences in the 
magnitude of these shifts. 

If we examine the plus (contrast) and 
minus (assimilation) ?¢’s the differences in 
sign can tell us whether either rating form 
favors the phenomena of contrast or of as- 
similation. The 80 Ss using the Simple form 
separate into 47 contrastors and 33 assimila- 
tors. This trend is reversed for the Detailed 
form which offers 29 contrastors, 47 assimila- 
tors, and 4 Ss showing neither. The differ- 
ence, however, is not significant, an overall ¢ 
ratio employing as scores the individual ¢’s 
with their signs regarded being only .88. 
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Table 3 


Mean r’s Between Experimental Values and 
Previous Standardization Values 


Initial Phase 
pp. 1-10 


Detailed .655 
Simple .655 


Terminal Phase 
pp. 17-26 


Detailed .537 
Simple .575 
L-H H-L 


Low Context 


Detailed .642 
Simple .528 


Detailed .557 
Simple .454 
H-L L-H 


High Context 


Accuracy of discrimination. An obviously 
important characteristic of any scale is the 
accuracy of the ratings it yields. Each judge 
was given two accuracy scores, one for his 
judgments of the 50 items of the initial phase, 
one for his judgments of the 50 items of the 
terminal phase. Correlation coefficients be- 
tween his ratings of the 50 items and their 
standardization values constituted these scores. 
The coefficients were transformed into z scores 
to provide normally distributed individual in- 
dices for testing the significance of the differ- 
ences between the experimental groups. The 
mean values, retransformed into r’s are shown 
in Table 3. Our first comparison for accuracy 
concerned the initial phase (pp. 1-10). For 
the Low-High groups the mean r was .655 for 
the Simple rating form and .655 for the De- 
tailed. For the High-Low groups the com- 
parable mean r’s were .528 and .642. The ¢ 
ratio of this last difference is 5.24, significant 
beyond the .0001 level, indicating that while 
the two forms are equal when used with low 
context items, the Detailed form is superior 
when used with high context items. Similar 
findings appear when the coefficients for the 
terminal phase are inspected. For the Low- 
High groups (now judging items in a high 
context) the mean r’s were .454 for the Sim- 
ple form and .557 for the Detailed, ¢ = 3.39, 
p < .0006. For the High-Low groups (now 
judging items in a low context) no significant 
differences were found between the two forms 
(values of .575 and .537). We can conclude 
that the Detailed form provides more ac- 
curacy when used in a high disturbance con- 
text. 





Two Rating Scales 


It also seemed appropriate to get an overall 
measure of accuracy by correlating the judges’ 
ratings on all 100 items with the standardiza- 
tion values. The judges using the Detailed 
form performed better than those using the 
Simple form, although the difference was not 
as significant as some reported above. The 
mean r’s were .764 for the Detailed form and 
.700 for the Simple, t = 2.26, p < .024. 

Loss of refinement of discrimination with 
shift in context. One of the most traditional 
findings when using the method of single 
stimuli is the loss of refinement of discrimina- 
tion which occurs with any drastic shift in 
the stimulus range such as is produced by the 
introduction of an extreme anchor or by 
shifts in context such as those in the present 
experiment. This loss in discrimination is 
illustrated in Table 3 which presents the 
mean r values in terms of a 2 X 2 latin 
square. Note that each of the four values to 
the right is lower than the corresponding one 
on the left. This illustrates the drop in ac- 
curacy in the terminal phase following the ex- 
treme shift in context. A control seems to be 
lacking here since the shift in context is con- 
founded with possible effects such as fatigue. 
A careful, page-by-page analysis of the data 


in our previous experiment (2), however, 
showed no general drop within each phase; 
rather, the loss in discrimination was clearly 
concentrated at the point of transition be- 
tween contexts. 

To evaluate the significance of the trends 
in Table 3 the data have been analyzed as a 


crossover design (3). The effect of phase is 
highly significant with an F ratio of 17.56 for 
the Simple form and 36.85 for the Detailed. 
Thus the Detailed form seems to show the 
greatest drop in accuracy of discrimination. 
In part, this higher F ratio for the Detailed 
form is attributable to a slightly smaller error 
term, but it is also attributable to a larger 
mean square for phases in the analysis of vari- 
ance. The significance of this finding is hard 
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to evaluate and its interpretation is complex. 
It probably should not be taken as a sign of 
inferiority for the Detailed form, as even with 
the greater drop, the Detailed form averages 
somewhat superior to the Simple form for the 
terminal phase, as is indicated in Table 3. 


Summary 


This study has compared two rating scales 
in terms of their resistance to the distorting 
effects produced by limited and shifting con- 
texts of stimulus materials. The assignment 
called for rating the degree of schizophrenic 
disturbance shown in definitions of words. 
One nine-point scale provided a minimum of 
descriptive material, while the other provided 
a verbal characterization for each of the nine 
scale values. The distorting effects examined 
were of two general kinds: shifting in the 
value of common stimuli as a function of con- 
text, and the loss of refinement or correla- 
tional accuracy. While in the subtler details 
the overall picture is complex, in general, the 
detailed rating scale has shown itself to be 
superior. It provides more nearly equivalent 
judgments from comparable groups of raters 
judging common items in disparate stimula- 
tion contexts. For stimulus materials in the 
high disturbance range, it provides a greater 
correlational accuracy. 

More generally, the approach suggests the 
utility of creating experimental stress tests in 
the evaluation of rating scales and other judg- 
mental procedures. 


Received September 26, 1957. 
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A Factor Analysis of Variables Related to Driver Training * 


Andrew L. Comrey 


University of California, Los Angeles 


A course given in the Los Angeles High 
Schools called “Driver Training” is devoted 
to observation, instruction, and actual behind- 
the-wheel practice. Each student electing 
this course is required tc fill out a card giv- 
ing certain biographical and other data. In- 
structors add information to these cards con- 
cerning the students’ performance. There 
were 1491 such students during the spring 
semester of 1954, a period chosen for study 
because it was long enough after the initia- 
tion of the program for record keeping to 
have become standardized and long enough 
ago to make it possible for the individuals in- 
volved to have accumulated a driver record. 

The names of these 1491 individuals were 
sent to the California State Department of 
Motor Vehicles to obtain the record of their 
subsequent accidents and traffic violations up 
to February, 1957. No driver record could 
be located for 373 of these individuals and 
two additional cases were dropped at random 
to make the total number of cases conform to 
the demands of the particular computing pro- 
gram used. The remaining cases were divided 
into two groups of 576 and 540, respectively, 
by alternate selection from an alphabetized 
list. Two completely independent and sepa- 
rate analyses of the same kind were carried 
out on these randomly composed samples. 
This procedure was adopted to provide an in- 
dication of the consistency of results. 

The recorded information available on each 
case was, unfortunately, very limited. Thirty- 

1 This investigation was carried out in conjunction 
with the research program of the Delinquency Con- 
trol Institute, School of Public Administration, the 
University of Southern California, Dan Pursuit, Di- 
rector. Research funds were provided by the Insti- 
tute’s donors: The Automobile Club of Southern 
California, the Hollywood Turf Club Associated 
Charities, and the Farmers Insurance Group. The 
author is greatly indebted to Melvin Schroeder of 
the Los Angeles public schools and to Fred Williams 
and Floyd Kortright of the California State Depart- 
ment of Motor Vehicles for providing the data used 


in this analysis. This was especially difficult and 
od consuming for the Department of Motor Ve- 
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six variables were extracted from the school 
and department of motor vehicle records, 
however, which seemed to offer the possi- 
bility of sufficient information to justify an 
analysis. Some of these variables were avail- 
able in continuous form, such as “height,” 
and others were available in only dichoto- 
mous form, e.g., “sex,” but all variables in 
the analysis were reduced to dichotomous 
form before computing phi coefficients, inter- 
correlating each variable of the 36 with each 
other. Two matrices were obtained, one for 
each sample. 

The variables used are listed in Table 1. 
The motor vehicle data were classified and 
grouped into categories, forming Variables 
1 through 8. A frequently occurring in- 
fraction could be treated separately, but sel- 
dom invoked infractions had to be combined 
with others of a somewhat similar type. 
Variable 2, for example, includes several 
kinds of violations, but mostly traffic light 
and boulevard stop violations. Unsafe driv- 
ing, or “moving,” violations other than those 
in Variables 2 and 4 were grouped in Vari- 
able 3. Variable 11 concerns a difference 
in address between high school and depart- 
ment of motor vehicles records. Variables 
12 through 18 were included to test the as- 
sociation of geographical location with the 
other variables. Schools were grouped ac- 
cording to area of the city. Variable 19 is 
also a school variable, indicating attendance 
at one of three special schools with many 
problem students. For Variable 21 and 23, 
separate medians for girls and boys were 
taken, using small random samples from the 
total group. Dichotomization was carried out 
with respect to these rough measures of cen- 
tral tendency. Variable 31 refers to the num- 
ber of hours the student spent in an instruc- 
tion car as an observer. For each variable, 
the positive side of the dichotomy is de- 
scribed or listed first. 

Eighteen centroid factors were extracted 





Factor Analysis of Variables Related to Driver Training 


Table 1 


A Summary of the Principal Rotated Factor Results 








Variable 


A B Cc 





. One or more nonmoving violations 
. One or more signal violations 
3. Unsafe driving violations 
. One or more speeding violations 
. Two or more violations 


. One or more accidents 

. A violation plus an accident 

. Restrictions on driver license 

. School Grade 12 status vs. others 
. Los Angeles address vs. others 


. Change of address after school 
. Valley area schools 

. Harbor area schools 

. Eastern area schools 

. Southern area schools 


. Metropolitan area schools 
. Western area schools 

. Hollywood area schools 

. Special schools 

. Sex (male vs. female) 


. Height (above Mdn for own sex) 
. Age (17 or more) 

. Weight (above Mdn for own sex) 
. Eye color (dark vs. others) 

. Hair color (dark vs. others) 


. Father has a business address 
. Father has a business phone 

. Mother has a business address 
. Mother has a business phone 

. Both parents are living 


. 12 or more observation hours 

. Driver Training Grade of A 

. Driver Training Grade of A or B 
. 50 or more class instruction hours 
. Three or more students in car 


. Unfavorable instructor remarks 


Note.— Decimal points have been omitted. 


from each matrix of phi coefficients.? Each 
set of 18 centroid factors was independently 


2 All the calculations for this study were carried 
out on SWAC, an electronic computer operated by 
Numerical Analysis Research at the University of 
California, Los Angeles, and supported by the Office 
of Naval Research. The opinions expressed here are 
the author’s and do not necessarily represent those 
of the US Navy. The complete correlation, centroid, 
and rotated factor tables have been deposited with 


62 
49 13 
42 
70 
91 


38 


rotated analytically using Kaiser’s orthogonal 
Varimax Method (1). This procedure pro- 


the American Documentation Institute. Order Docu- 
ment No. 5595 from the ADI, Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington 25, D. C., remitting in advance 
$1.25 for 35 mm. microfilm or $1.25 for photocopies 
readable without optical aid. Make check payable 
to Chief, Photoduplication Service, Library of Con- 
gress. 
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vides a solution approximating simple struc- 
ture by maximizing the variance of the 
squared extended vector projections over all 
possible pairs of factors. Iteration proceeds 
until an acceptable degree of convergence oc- 
curs. A good simple structure was obtained 
in both analyses. 


Results 


Of the 18 factors extracted in each analy- 
sis, 16 were sufficiently identical to warrant 
being called the same factor. Two in each 
analysis failed to match. Since the results 
for the two analyses were so similar, figures 
are given in Table 1 only for the first analy- 
sis in order to conserve space. For the fig- 
ures given in Table 1, the corresponding 
values for the second analysis agree within 
.05, except in the following cases: Factor A, 
Variables 2 and 3 were .41 and .58; Factor B, 
Variable 4 was .24; Factor D, Variable 18 
was — .12; Factor E, Variable 19 was — .25; 
and Factor G, Variable 19 was .14. The cri- 
terion for including a figure in Table 1, rather 
than leaving a blank space, was that the load- 
ing had to be .1 or more in both analyses, or 
.2 or more in either analysis. 


Of the 18 factors extracted and rotated, 
only the main seven, Columns A through G, 
will be given in Table 1. The column headed 
by “p” gives the proportion of cases above 


the dichotomy point. Eight additional fac- 
tors were of some size, but six of these proved 
to be determined almost exclusively by the 
geographical area school variables, 11 through 
18, and Variable 10. Since the division of the 
students into geographical school areas intro- 
duced artificial interdependencies, and, hence, 
spurious correlations between these variables, 
it was necessary that several such factors 
emerge. Their appearance in no way distorts 
the principal factors, however, since a suffi- 
cient number of factors was extracted. Two 
other factors were confined to the Variables 
26 through 30. These factors failed to have 
major loadings for variables from the other 
sectors of interest. The remaining three fac- 
tors were very minor, two failing to be 
matched in the two analyses. 

The seven factors given in Table 1 are 
readily interpretable as; A. Traffic Law Viola- 
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tion, B. Accidents, C. Age, D. Course Grades, 
E. Greater Car Use, F. Physical Size, and G. 
Dark Coloring. A striking feature of the re- 
sults is the relative independence of the tend- 
encies to have accidents and to receive traffic 
citations. The major factor representing most 
of the variance for the citation variables had 
no loading of any importance for the accident 
variable. There was a sizeable loading for 
the variable “one or more accidents and one 
or more violations,” but since no loading ap- 
peared for the pure accident variable, only 
the “violations” part of this complex variable 
presumably is involved. A small amount of 
the traffic citation variance did appear on the 
major factor defining accidents, however. The 
variable “one or more speeding violations” 
had loadings of .12 and .24, respectively, in 
the two analyses. This picture is supported 
by reference to the original correlations be- 
tween “one or more speeding violations” and 
“one or more accidents.”” For the two analy- 
ses, these phi coefficients were .16 and .24, re- 
spectively. Although these correlations are 
both significant beyond the .01 level, the pro- 
portion of common variance indicated by 
these coefficients is less than six per cent. 
For the particular population represented by 
these samples, therefore, it must be concluded 
that there is only a slight tendency for per- 
sons receiving speeding citations also to have 
had accidents. With respect to nonspeeding 
citations, there seems to be no relationship 
with accidents. 

There is little doubt that male drivers in 
this population are more likely to receive cita- 
tions and to have had more accidents than 
female drivers. It is interesting to note, how- 
ever, that the proportion of common variance 
with the sex variable is about five times as 
great for the traffic-law-violations factor as for 
the accidents factor. In fact, the less than 
four per cent common variance between the 
accidents factor and sex is surprising in view 
of the relative insurance risks commonly as- 
signed to young male and female drivers. 

Grades in driver training courses appar- 
ently have no validity for predicting who will 
receive traffic citations or have accidents. 
Socioeconomic and geographic variables als. 
failed to show any important relationships to 





Factor Analysis of Variables Related to Driver Training 


traffic citations or accidents. The dark color- 
ing factor failed to have any loading for the 
accident or citation variables, tending to sug- 
gest that Mexican and Negro groups are not 
markedly different from other groups in these 
respects. Students in the special-problem 
schools showed a slight tendency to have 
more citations although this same tendency 
did not appear with respect to accidents. 
Since these schools have many “problem” 
students, the relationship with traffic offenses 
is not surprising, except, perhaps, because it 
is so low. The low p value of .04 may be 
partially responsible for this, however, since 
phi is dependent upon the marginal totals. 
This study has succeeded more in showing 
what accidents are not related to than what 
they are related to. In short, with respect to 
the population considered, we cannot single 
out the traffic offender, the poor student, the 
minority group member, the male, or the 
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driver from a particular part of town as be- 
ing much more likely than any other ran- 
domly selected population member to have 
had an accident. Since the present popula- 
tion consists only of individuals who elected 
driver training, it cannot necessarily be as- 
sumed that these same results would hold in 
the wider population. Many of these vari- 
ables may be more highly related to accidents 
in the general population, since individuals 
electing driver training are apt to be more 
conforming and generally less characterized 
by irresponsible behavior which may lead to 
accidents. 
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Effect of Time Limitation on Making Settings on a 
Linear Scale * 
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A number of variables have been investi- 
gated regarding cursor positioning perform- 
ance on a linear scale by means of control 
knobs (3, 4, 5). The most important vari- 
able seems to be control ratio, the ratio be- 
tween cursor movement and revolution of the 
control knob. Other variables such as knob 
diameter, friction, inertia, and backlash be- 
come important only under special conditions. 

The present study differed in two major re- 
spects from previous investigations. First, 
the time allowed an S to make a setting was 
varied systematically and, second, a measure 
of error (final discrepancy between cursor and 
target) was obtained. Varied simultaneously 
with time limitation were (a) direction of 
initial cursor displacement from target, (5) 
distance of cursor travel, and (c) control 
ratio. Both time and error were measured. 


Method 


Apparatus. From the S’s point of view the appa- 
ratus consisted of a large panel with an eye level 
rectangular hole cut from the middle. When not 
covered by a shield, the linear scale was visible 
through the cut-out. The scale itself consisted of a 
plain white card, }” by 11”, with a hairline scribed 
vertically at the center point. The cursor, a piece 
of lucite with a vertical hairline, was controlled by a 
knob, 2 in. diameter located at a convenient posi- 
tion for a seated right-handed S. 

The knob shaft was coupled to the cursor by a 
ball disk integrator (Western Electric KS-8710) and 
a magnetic clutch. The clutch was energized only 
during the trial interval. At the instant each trial 
started, the shield in front of the scale was dropped, 
the clutch energized, and two of the three timers 
(Standard Electric S-1) were started. Three time 
intervals were measured: (a) the overall time from 
trial start to finish (total time), (6) the time from 


1 This article is derived from a thesis submitted by 
the senior author to the Department of Psychology 
of Lehigh University in partial fulfillment of the re- 
quirements of the degree of Master of Arts. The au- 
thors wish to express: their indebtedness to W. L. 


Jenkins. 

2 Present address: Aero Medical Laboratory, Wright 
Air Development Center, Wright-Patterson Air Force 
Base, Ohio. 
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trial onset until the cursor was moved to within 
0.1 in. of the target (travel time), and (c) the time 
from the 0.1 in. position until either the S was satis- 
fied with his alignment and threw a switch stopping 
the timers or the trial was ended by the E because 
time allotted had expired (adjustment time). Time 
was measured to the nearest 0.01 sec. and error to 
the nearest 0.0025 in. 

Procedure. All 12 Ss were right-handed and all 
participated in one practice session followed by nine 
experimental sessions. Sessions lasted about 45 mins. 
and were separated by approximately 48 hrs. 

Each S was required to make settings at 12 time 
intervals in decreasing order (4.0, 3.0, 2.6, 2.2, 1.8, 
1.6, 1.4, 1.2, 1.0, 0.8, 0.6, and 0.4 secs.). All Ss be- 
gan with the 4 sec. interval. 

During each experimental session only one control 
ratio was used, selection being determined by a 
counterbalanced order. Two Ss were assigned to 
each of the six possible orders of the three ratios 
(1 in., 2 in, and 4 in. of pointer movement per 
revolution of knob). 

Sessions, then, consisted of Ss making 144 settings. 
At each of the 12 time intervals, 6 settings were 
made involving cursor displacement to the right and 
6 settings involving cursor displacement to the left. 
Short travel (15/16”) was involved in 3 of these set- 
tings and 3 involved long travel (50/16”). Except 
for decreasing time and control ratio, conditions 
within any session were presented in a chance fash- 
ion. All Ss participated in all experimental condi- 
tions. 

Instructions. Excerpts from the Ss’ instructions 
follow. “You are going to be given a certain amount 
of time to make a setting. After several trials this 
time interval will be reduced. Your task is to turn 
the knob with your right hand so that the cursor 
hair-line exactly superimposes the scale hair-line. 
You are to do this as fast and as accurately as pos- 
sible.” 


Results 


Error. An analysis of variance for the or- 
thogonal variables of allotted-time interval, 
control ratio, distance of cursor travel, direc- 
tion of initial cursor displacement from target, 
Ss, and order of ratio presentation is summa- 
rized in Table 1. The error terms used for 
testing each of the main effects and interac- 
tions are indicated by numerical coding in the 
extreme right and left columns of the table. 
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It will be seen that since Ss may be regarded 
as a random sample of Ss in general, those 
higher-ordered interactions containing Ss are 
the appropriate error term for lower-ordered 
interactions and main effects (1, pp. 247- 
252). Since the assumption of homogeneity 
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of variance was not met, the .001 level of 
confidence was chosen as suggested by Lind- 
quist’s review of the Norton study (6, pp. 
78-90). 

It will be noted that in Table 1, five of the 
32 sources of variation account for over 95% 


Table 1 


Summary of Analysis of Variance for Variables of Time, Ratio, Distance, Direction, Subjects, and Order 


Source 


df Estimate F 





Time (T) 

Ratio (R) 

Distance (Ds) 
Direction (Dr) 

S’s within order (S) 
Order (O) 


TXR 
T X Ds 
TX Dr 
TXS 
R X Ds 
R X Dr 
RXS 
Ds X Dr 
DsxX5S 
Drx<S 
TX RX Ds 
TXRxXDr 
TXRXS 
T X Ds X Dr 
TXDsxXS 
TxXDrxS 
R X Ds X Dr 
RXDsxXS 
RXDrxS 
DsX DrXS 


on auf @On 


TX RX Ds X Dr 
TXRXDsxXS 
TXRXDrxXS 
TXDsxXDrxXS 
RX DsxXDrxXS 


TXRXDsxXDrxXS 
Within 
Total 


34 Pooled nonsig. 27-33 
35 Pooled nonsig. 17-33 
36 Pooled nonsig. 7-33 


* 001 probability. 
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Variance Error 
Term 


11 106.7534 
2 17.2564 
157.2590 


405.25* 10 

69.59* 13 
1 189.80* 15 
1 2.6974 5.79 16 
6 1.3937 27.97* 36 
5 3368 - 5 


22 3.0928 
11 38.9318 
11 3684 
.2634 
13.9037 
.2723 
.2479 
1.3807 
8285 
4650 
2.6342 
.0570 
0547 
.1639 
.2467 
.2086 
.2557 


62.53* 35 
157.78* 21 
1.75 22 
5.33* 35 
45.87* 24 
5.51 35 
5.01* 35 
3.60 26 
16.75* 35 
9.4* 35 
18.51* 
1.15 
1.11 
1.01 
5.02* 
4.24* 
4.11 
6.2° 
1.69 
7.8* 


1.60 
2.89* 
1.61 
3.29* 
1.26 


13,824 
15,551 


14,352 
14,651 
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The relation of final error to allotted time, control ratio, and direction and magnitude of cursor 


travel. 


of the total variance. In spite of this there 
are 18 sources which prove to be significant 
at the .001 level. This is due in large meas- 
ure to the power of the error estimate, the 
within estimate containing 13,824 df. 

Figure 1 shows the mean error for all 12 
Ss for all 144 experimental conditions. Each 
data point is based upon 108 observations. 
Panels of Fig. 1, from left to right, show the 
results for the 1 in., 2 in., and 4 in. control 
ratios, respectively. Within each panel are 
four curves representing long and short travel 
distance and right and left initial cursor dis- 
placement. Although, in Table 1, Direction 
was not found to be a significant source of 
variance, when pairs of right versus left dis- 
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Error as a function of allotted time, control 
ratio, and cursor travel distance. 
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placements are tested by the Sign Test (2, 
pp. 547-549) the left means differed from 
the right at the .01 level. 

In order to demonstrate more clearly the 
effect of ratio, travel distance, and time al- 
lotted per setting upon error, right and left 
directions were combined. These means, each 
based upon 216 observations, are presented 
in Fig. 2. 

Time measures. The distribution of time 
measures was markedly skewed, especially at 
the short allotted times. This skewness is re- 
lated to the time pressure on the Ss; since 
often when little time was allowed, Ss would 
fail to complete their setting. Because of this 
response distribution, medians rather than 
means were computed. 

Figure 3 is similar to Fig. 2 except that 
time instead of error is plotted on the ordi- 
nate. Both travel and adjustment time are 
shown as a function of allotted time, ratio, 
and travel distance. Total time to make a 
setting may be obtained by adding travel and 
adjustment times for a given condition. Since 
a trial could be terminated by either the E 
(when the allotted time had expired) or the S$ 
(when he was satisfied with his setting) it is 
impossible to tell from the data who termi- 
nated the trial if total time equals the al- 
lotted time. Thus, if an S’s total time came 
within 0.05 sec. of the allotted time, the ad- 
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justment time data for that condition were 
discarded; similarly, if travel time was within 
0.05 sec. of allotted time, it too was dis- 
carded. 

Nominally in Fig. 3 each data point rep- 
resents the median of 12 Ss, each of whom 
made 18 observations. Because of the cri- 
teria adopted regarding acceptability of the 
time scores, not all Ss are represented in each 
data point. If less than 6 Ss did not meet 
the criteria, no point was plotted; hence, in 
Fig. 3, it will be seen that points are not pre- 
sented for all allotted time, especially for ad- 
justment time which would be the first to 
suffer if the trial were terminated by the EZ. 

The sum of travel and adjustment time 
(total time) decreases markedly as allotted 
time is shortened. In addition, especially for 
long travel, the coarser ratios yielded shorter 
total times. If the discarded data had been 
included, the plotted data points would not 
have shifted materially and the travel time 
curves would approach the y = x line asymp- 
totically and pass through the origin. The 
adjustment time curves would approach the 
y = x-travel time line and also pass through 
the origin. 
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Fic. 3. The relation of travel and adjustment time 
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Fic. 4. The relation of travel and adjustment time 
to allotted time, control ratio, and cursor travel dis- 
tance. 


Figure 4 shows median travel and adjust- 
ment time for long and short travel, right and 
left directions, with the 1 in., 2 in., and 4 in. 
control ratios. The plotted values include all 
time intervals and Ss; and thus, each point 
represents 1296 observations. Although dif- 
ferences were small, travel time was less, 
p< .05 (2, pp. 547-549), when the initial 
cursor displacement was to the right of the 
target. 


Discussion 


Certain aspects of the present study may 
be compared to those of Jenkins and Connor 


(3). These authors found that increasing 
control ratio from 1 in. to 4 in. per revolu- 
tion did not decrease travel time appreciably. 
In the present study this is true only for the 
short travel distance. These authors also 
found that with an increase in control ratio, 
adjustment time increased. This is partially 
substantiated in the present study, but may 
be due to other factors. It may be simply 
that because the higher ratios give faster 
travel and thus allow more time for adjust- 
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ment within the allotted time, the S simply 
uses all the time available to him to make the 
adjustment. The results in the present study 
may be related to the interaction between 
time pressure and ratio rather than ratio 
per se. 

This same time pressure-control ratio in- 
teraction hypothesis may serve to explain one 
of the major findings of the present study. 
When ample time is allowed, the accuracy of 
the setting is fairly independent of the con- 
trol ratio, but as the allotted time is reduced, 
especially with long travel distances, use of 
the coarse ratio clearly results in more ac- 
curate performance. That is, the coarser 
ratios yield superior accuracy under time 
pressure possibly because they effectively al- 
low more time for adjustment. 


Summary 


Performance on a linear scale as a function 
of four independent variables was _ investi- 
gated: (a) reduced time intervals in which 
to make a setting, (4) control ratio, (c) di- 
rection of initial cursor displacement from 
target, (d) distance of cursor travel. Twelve 
Ss participated and each was instructed to 


make settings as fast and accurately as pos- 


sible. The size of the final discrepancy be- 
tween target and cursor (error) was measured 
as was the time for travel to the approximate 
location of the target and the time for final 
adjustment. 


David C. Greek and Arnold M. Small, Jr. 


When ample time is allowed to make a 
setting, use of a relatively fine control ratio 
gives maximum accuracy; with limited time, 
a coarser control ratio gives maximum ac- 
curacy. 

The critical allotted time interval at which 
error magnitude increases rapidly is depend- 
ent upon both travel distance and control 
ratio. Reduced time to complete a setting 
may be partially compensated for by coarser 
control ratios or a reduced travel distance. 

Time taken by the S to complete a setting 
decreases with shorter allotted times and 
coarser control ratios, as well as with short 
travel distances. 


Received September 3, 1957. 
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The Humm-Wadsworth Temperament Scale 
(2, 4, 5, 7) consists of 318 items altogether. 
The choice of “Yes” or “No” for 164 of these 
items is taken to indicate the existence of 
seven personality components: normal (N), 
hysteroid (H), manic (M), depressive (D), 
autistic (A), paranoid (P), and epileptoid 
(E). Weights varying from 1 to 6 are al- 
lotted to indicative choices according to their 
diagnostic power (7). An item often belongs 
to more than one scale; several components 
may even share the same response alternative. 
Thus the choice of “Yes” for “Do you some- 
times feel cross or grouchy without special 
reason?” adds 3 to the strength of the D com- 
ponent and 1 to the A component. While, 


however, N is determined mainly by negative ' 


choices and E by a more equal number of 
negative and affirmative ones, maximum 
strength for the other components is ac- 
quired more or less exclusively by the choice 
of “Yes.” Thus, inevitably, the number of 
negative alternatives chosen in the H-W test 
will correlate positively with the strength of 
N and negatively with the strength of H, M, 
D, A, and P. In order to make test results 
comparable for people with differing num- 
bers of No-responses Humm has, therefore, 
introduced a correction for No-count plus 
other corrections diminishing the correlation 
(3, 5, 6). 

The median for No-count in Humm’s sam- 
ples is around 167 (5). In a Swedish sample 
of 978 job applicants tested at 4 factories (by 
certified Humm-testers with the authorized 
Swedish version of the test) the No-count 
median was between 195 and 200, i.e., out- 
side the range which is considered acceptable 
by Humm and his associates (the standard 
deviation being approximately the same). 
Since Humm’s samples consisted of people 


1We refer to all these corrections when talking 
about No-count corrections below. 


who did not depend on the test results for 
their employment as did our sample, this dif- 
ference in test situation would apparently ex- 
plain the difference in No-count. Defenders 
of the Humm test, aware of the serious im- 
plications of such an explanation, have tried 
to rejoin that a great many applicants are 
more maladjusted than people holding steady 
jobs and are thus less open to questioning and 
more apt to hide their faults and to choose 
“No” for an answer. But since we found no 
conclusive differences in our sample between 
those who became employed after testing and 
those who were discarded, we are hardly 
willing to subscribe to such a “charactero- 
logical” explanation. We do not believe 
either, as Humm obviously does, that the 
preference for normal indicators at the ex- 
pense of pathological ones which goes with a 
high No-count is only an epiphenomenon, the 
consequence of an S’s general, negativistic 
bias for the very word “No.” It seems much 
more sensible to reverse the argument and as- 
sume that people applying for a job want to 
appear as normal and desirable as possible 
and that, owing to such an attitude and the 
tendency of No-responses in this test to be 
socially more acceptable, their No-counts au- 
tomatically increase over those for people who 
are already holding a job and have nothing 
serious at stake. If “Yes” was generally the 
more acceptable alternative it would natu- 
rally be chosen by those who now prefer 
“No.” 

The main purpose of this paper is, there- 
fore, to test the hypothesis that the results of 
the H-W inventory are sensitive to the situa- 
tion of the S (cf. also 1). The confirmation 
of such a hypothesis will naturally affect the 
use also of other personality questionnaires, 
most directly those screening tests which have 
been standardized in a situation widely dif- 
ferent from the situation for which they are 
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recommended. More specifically, we also 
want to inquire into the concept of “response- 
bias” as described above and to control em- 
pirically if corrections based on such a ra- 
tionale can be effective. 


The First Experiment 
Subjects and Method 


This experiment should be considered preliminary 
and is in part a repetition of one performed by Giese 
and Christy and reported by Tiffin (8, pp. 170 f.). 
Twenty-six students were selected at random from 
two senior classes in a teachers’ college; 12 of them 
were men (aged 24.7 years) and 14 women (aged 
22.9 years). Two men had to leave before the sec- 
ond testing and were therefore excluded from the 
group. We chose these people as Ss because their 
level of education was about the saine as for the 
large sample referred to in the introduction. More- 
over, all students were well above the age limit set 
by Humm for the use of his test. 

The E gave the usual instructions for the H-W 
questionnaire but added the following sentences: 
“We only want to test the inventory—not you or 
your personality traits. This is a purely scientific 
. experiment, and all results will be treated confi- 
dentially. The college has nothing to do with this 
testing. Teachers and other people on the staff will 
have no access to your results.” We call this situa- 
tion A. When an S had completed his test the fol- 
lowing instructions were placed before him to read: 
“We are sorry that we have to ask you to fill in the 
questionnaire once again. But now you should try 
to imagine that this is an aitempt to examine your 
teaching ability, ie., your test results will be used as 
a measure of your suitability as a future teacher.” 
This was the B situation. 


Differences Caused by the Change in Situa- 
tion 
Table 1 summarizes the differences in No- 
count between the two situations. Since 
women had less No-responses than men, we 
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Table 1 


No-Count Differences in the First Experiment 


Comparison 


P 


No- 


Sex Situation Count t 


“Confidential” (A) 
“Applicant” (B) 


Men 
(n= 10) 


175.7. 2.70 .05-.02 


198.9 


“Confidential” (A) 
“Applicant” 


155.6 
171.7 


Women 2.94 .02-.01 


(n= 14) (B) 


treated the sexes separately. It is evident 
that for both men and women the B situa- 
tion, even if its stress was not real, caused a 
significant increase in No-count. These re- 
sults may be taken as a preliminary confirma- 
tion of our hypothesis that the H-W test de- 
pends on the test situation. 

Two sets of results are presented in Table 2: 
(a) raw scores derived directly from an S’s 
response pattern, () profile values corrected 
for No-count, etc., and transformed into a 
21-point scale. While raw scores for N in- 
crease from A to B and those for other com- 
ponents decrease, this trend is reversed in N, 
H, and P as far as profile values are ccn- 
cerned. Such a tendency for No-count cor- 
rections to affect some component values more 
than others is also reflected in the integration 
indices (Table 3), i.e., the sum of the differ- 
ences between profile values in N and each of 
the other components. Although our Ss chose 
19 additional ‘“No’s” in B they got an inte- 
gration index of 30.0 as against 35.3 in A. 
Humm’s corrections for an increase in No- 
count do not seem to restore profile values in 
B to the same level as in A. 


Table 2 








Scores Situation 





Raw scores “Confidential” (A) 


“Applicant” (B) 


“Confidential” (A) 
“Applicant” (B) 


Profile values 





Means of Raw Scores and Profile Values in the First Experiment 


Component 





M D he 





36.0 
24.2 


8.2 
6.7 


50.5 
34.0 


12.5 
11.7 


38.7 
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Table 3 
No-Counts and Integration Indices in the 
First Experiment 


No- 
Count 


Integration 


Situation Index 


“Confidential” (A) 
“Applicant” (B) 


164.0 
183.0 


35.3 
30.0 


There are further reasons to question the 
rationale upon which No-count corrections are 
based. Correlations between raw scores in A 
and B are all positive and all except one sig- 
nificant (Table 4). Since the increase in No- 
count varies considerably among our Ss the 
coefficients of correlation for raw scores are 
of an expected magnitude. Profile values in 
B, on the other hand, have been corrected for 
a change in “response-bias”; and correlations 
between profile values should thus increase as 
compared with raw score correlations. But 
there is no such trend in our results. 

In comparison with the correlations found 
by Giese and Christy (8, pp. 170 f.) our cor- 
relations are rather high, perhaps because our 
second testing was performed immediately 
after the first. Moreover, our sample may 
have been less willing to submit to imagina- 
tion. In spite of the positive correlation co- 
efficients, however, our results reflect rather 
marked changes from A to B. If the instruc- 
tions had remained the same in B, our co- 
efficients could have been regarded as an esti- 
mation of the retest reliability. We know 
that response patterns often differ from one 
testing to another because the attitude of the 
S to himself and to the test changes. But 
even a retest coefficient should exceed + .50 
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considerably. There is no doubt that the re- 
versal of instructions has accentuated the dif- 
ference between test profiles in A and B. 


The Second Experiment 
Subjects and Method 


This experiment represents an improvement over 
the preliminary one in that the B situation was made 
much more real. Two groups of Ss were selected 
at random from the three senior classes in another 
teachers’ college. There were 33 Ss in Group I (17 
men and 16 women) and 35 Ss in Group II (18 
men and 17 women). The average age for all of the 
four subgroups was very close to 24 years. In order 
further to control the sampling we compared the av- 
erage term characters (ranging from 1-5) in teach- 
ing, Swedish, mathematics, and athletics for the two 
groups. These means proved to be almost identical. 
The two main groups were tested at the same time 
in different classrooms. After reading the usual H-W 
instructions, E added the following remarks. 

Group IT: See instructions for situation A above. 

Group II: “We don’t want you to remain ignorant 
of the fact that this testing may be important for 
you. It is an attempt to examine your teaching 
ability experimentally, ic., your test results will be 
used as a measure of your suitability as future 
teachers.” 

Reactions to the introduction of the questionnaire 
differed considerably between the groups. In Group 
I, Ss worked in silence and with concentration and 
finished their task quickly. Group II received the 
instructions with dissatisfied murmurs, exchanged 
meaningful glances and were only too eager to criti- 
cize the questions. Some of the Ss even wanted to 
leave in protest but were prevented by the EZ. They 
finished the test about half an hour later than Group 
I, obviously very tired and anxious. 


Differences 
Values 


in No-count and Component 


Table 5 includes a series of comparisons be- 
tween the two groups. The number of No- 


Table 4 


Correlations Between Situations A and B 





Component 





N 


H 


Scores 





M D 





r* 


P 


Raw scores 


+.92 
<.001 


+.55 
<.01 


r*® 


P 


Profile values 


* Product-moment correlation. 


+.54 
<.01 


+.52 
<.02 


+.46 
<.02 


+.43 
<.05 
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responses is significantly higher in Group II, 
for men as well as for women. In view of the 
fact that the group of applicants referred to 
in the introduction was predominantly mascu- 
line it is of special interest to note that the 
average No-count of men increases with 33 
choices. The No-count in Group I is some- 
what higher than in Humm’s samples. But 
the marked differences between the two groups 
makes our introductory assumption plausible 
that the difference in No-count between our 
sample of applicants and Humm’s employed 
samples was to a high degree due to differ- 
ences in the general test situation. We also 
observe that the average No-count among the 
men in Group II is about the same as for 
male Swedish applicants. These results im- 
ply a further substantiation of our basic 
criticism against the H-W test: that it is not 
warranted to apply norms standardized for 
employees to the results of job applicants. 
The change in No-count is reflected in the 
raw scores. While N increases and E remains 
relatively unaffected, raw scores for the other 
components decrease. There are more signifi- 
cant differences for men than for women be- 
cause the change in No-count was less for the 
latter category. The groups differ most mark- 
edly with respect to D and A. Even if N 
ought to be affected by a changed “response 
bias” about as much as A, the difference be- 
tween I and II is hardly significant; this re- 
sult is in agreement with the high intersitua- 
tional correlation reported for N in the previ- 
ous experiment. Differences in profile values 
also show about the same trends as before: 
while thus N and A (in the male subgroup) 
diminish significantly, the average score for P 
increases. Humm’s corrections apparently do 
not fit the changes in raw scores which go 
with such a substantial enhancement in No- 
count as reported here; and we turn to an 
analysis of individual items in order to illus- 
trate more clearly how the excessive No-re- 
sponses are distributed over the components. 


Differences in Responses to Individual Items 


In this analysis we did not keep men and 
women apart but compared only the two 
main groups. All P values reported for dif- 
ferences between choice distributions in the 
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groups were based on the Mostellar-Tukey 
graphic x’ calculations. Before knowing any- 
thing about the results of the analysis, the 
present authors tried to guess which items 
should be most affected by the difference in 
test situation introduced here, i.e., we scruti- 
nized the content and formulation of each 
question, especially those with an obvious 
moralistic bias, for how much a “Yes” or 
“No” was likely to clash with the ideal image 
of a teacher. 

Figure 1 shows the percentage of indicative 
alternatives chosen by the groups. The main 
differences between I and II are in line with 
those reported for raw scores above. But al- 
though the distribution of indicative choices 
is rather similar in H, M, D, A, and P (“‘Yes” 
most often indicates the existence of the typi- 
cal “disposition”) the change in their relative 
numbers from I to II is not the same. The 
decrease is most evident for A, the raw scores 
and profile values of which diminish accord- 
ingly. P, with the same percentage of indica- 
tive choices in Group I, is much less affected; 
and the obvious result is, in spite of diminish- 
ing raw scores, that profile values increase in 
Group II. This is the effect of No-count cor- 
rections when the change in No-count is re- 
lated to more change in A and less in P than 
implied in Humm’s equations. The slight in- 
crease in N also explains why profile values 
are lower in Group II. 

As many as 59 choice distributions for indi- 
vidual items changed so much that the differ- 
ence between the two groups became signifi- 
cant beyond the .05 level. The present au- 
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Percentages of indicative alternatives chosen 
by Groups I and II. 
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thors picked out 74 items which they believed 
would be especially sensitive to the difference 
in test situation. If the 59 significant items 
were randomly distributed over the entire 
scale 13.7 would fall among the 74 items just 
mentioned. But our guessing, even for the 
direction of change, was correct in 29 cases. 
The probability of arriving at this number of 
correct guesses was less than .01 (,? = 6.73). 
This result tends to support our hypothesis 
that the “extra” No-responses in Group II 
concern such items for which a “Yes” would 
be socially inopportune. 

If the difference in instructions had been of 
no importance we might have expected a sig- 
nificant change (P < .05) in 15.9 items out 
of 318, or, in 8.2 of the 164 indicative items. 
Consequently, there can be no doubt that the 
test situation influenced response patterns. 
But the 59 (36 for indicative items) signifi- 
cant differences between the groups do not 
imply only that Group II tried to avoid af- 
firmative answers and thus proved to be sensi- 
tive to situational stress. If the additional 


No-responses had been randomly distributed 
over all items we would have expected about 
64 (33) significant differences, but only 13 
(7) beyond the P level of .01 and 1.3 (.7) 


beyond .001. Since we found 28 (16) dif- 
ferences to be significant beyond .01, 9 (4) 
of them even beyond .001, our results indicate 
instead, that Group II concentrated on a 
limited number of items. 


Summary and Conclusions 


This paper was intended to examine the 
sensitivity of the Humm-Wadsworth Tem- 
perament Scale to the test situation. One 
group of 24 Ss was tried, first in a “clinical” 
situation and then in an “applicant” situa- 
tion. We also compared two groups of 33 
and 35 Ss, respectively, randomly selected 
from a larger group, each of which was tested 
in one of the above situations. The results of 
the two experiments were essentially similar. 
The number of No-responses increased sig- 
nificently in the “applicant” situation, espe- 
cially when it was made more real in the 
second experiment, and reached about the 
same level (near 200) as in a Swedish group 
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of 978 job applicants. The median number 
reported by Humm for his samples was 167. 
The increase in No-count implied an increase 
in raw scores for the normal component and 
a decrease for the “pathological” components. 
Even profile values in the stress situation, 
which were corrected for the additional No- 
responses, differed from those in the control 
situation. An analysis of responses to indi- 
vidual items revealed that this inability of 
Humm’s corrections to restore profile values 
to the “control” level was due to the fact that 
an increase in the number of No-responses im- 
plied a change in response patterns vis a vis 
selected components and items. 

Humm and his associates have standardized 
their instrument in samples of people who 
were already employed and had thus nothing 
serious at stake when tested. But they rec- 
ommend the questionnaire for use in situa- 
tions where results have proved to be signifi- 
cantly different. Corrections for No-count, 
established in a nonapplicant sample, are sup- 
posed to compensate for these differences. 
But Humm’s corrections, necessarily, were 
built upon the assumption that when the 
number of No-responses exceeded the norm 
they would be proportionally dispersed over 
the seven components.* If this assumption 
had been correct, a frequent choice of “No” 
would result only in a narrowing of the dis- 
tribution of profile values as demonstrated for 
a group of 508 Swedish job applicants. But 
the results presented here suggest, in addi- 
tion, that the cause of a changed response 
pattern in a stress situation was not an in- 
tensification of a general, negativistic bias for 
the word “No” but rather an increased pref- 
erence for socially acceptable answers to a 
number of sensitive questions. Our inevitable 
conclusion must therefore be that test pro- 
files from the H-W scale, even if corrected 
for No-count, often include too many uncer- 
tainties to be accepted as indicative of a per- 
son’s temperament.* 

2 There is an approximately linear relationship in 
Humm’s corrective nomogram between high No- 
count, high raw scores for N and low for H-P on 
the one hand and profile values on the other. 

8In order to simplify matters, we have not pre- 
sented the mathematical formulas upon which 


Humm’s corrections were based, only the rationale 
for the very attempt at such corrections. Since this 
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This conclusion may very well apply also 
to other questionnaires where Ss are asked to 
describe their own behavior. The widespread 
use of these instruments can often be justified 
because they are simple and direct and re- 
quire a minimum of experimental prepara- 
tion and theoretical ramifications. After this 
analysis of one of the more widely used in- 
ventories the present authors tend to be 
biased in favor of either more projective tech- 
niques of questioning or of more rigorous ex- 
perimental procedures for personality (and 
personnel) testing. 


Received September 10, 1957. 
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The Internal Consistency of the Humm-Wadsworth 
Temperament Scale 


Gudmund Smith and Sven Marke 
University of Lund, Sweden 


The Humm-Wadsworth inventory (3, 4) 
has not been subjected to any analysis of the 
homogeneity of its scales. The main reason 
for this failure to control the internal consist- 
ency of a widely used screening test seems to 
be that methods of scale analysis, at least in 
personality test construction, have rarely been 
applied until the last decade. Traditionally, 
‘only problems of reliability and validity were 
regarded as crucial for personality question- 
naires. When test instruments are intended 
to measure very general forms of behavior, 
as, e.g., the ability to cooperate or to lead, 
an analysis of internal consistency, naturally, 
may be rather superfluous. Since such be- 
havior patterns rarely reflect consistent per- 
sonality variables, test instruments con- 
structed for their evaluation must be made 
up of items which are rather loosely connected 
with each other. Two individuals who reply 
differently to a number of questions included 
in such a test need not differ, for instance, 
with respect to their fitness as leaders. Here, 
lack of one-dimensionality does not exclude 
high validity. But problems of internal con- 
sistency ought to be important for those per- 
sonality inventories which claim to measure 
the strength of basic personality variables and 
their mutual interrelation. 


Considerations Regarding the Humm- 
Wadsworth Test 


The H-W test intends to measure seven 
components of temperament described on the 
basis of Rosanoff’s textbook in psychiatry (7). 
There are also a number of subcomponents, 
31 altogether. But since these are to be re- 
garded as modified manifestations of the basic 
tendencies, we will concentrate our analysis 
on the seven main components: normal (N), 
hysteroid (H), manic (M), depressive (D), 
autistic (A), paranoid (P), and epileptoid 
(E). Considering the symptomatic character 
of this classification we should expect some 
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correlation between items representing the 
same component. This does not mean that 
we demand complete homogeneity of a collec- 
tion of items constituting a component scale: 
the theory allows some variation in symptoms 
indicating a latent personality tendency and 
its intensity and, therefore, necessitates that 
scales include questions which differ from 
each other to some extent with regard to 
their type and aims. One might perhaps de- 
mand the same degree of internal consistency 
of the seven scales in the H-W test as social 
psychologists demand of their attitude scales. 
If the scales are homogeneous, one can con- 
clude, with reasonable certainty, that they are 
also reliable—but the reverse does not hold. 
In any case, the H-W scales should not 
differ considerably among themselves with re- 
spect to internal consistency. If that were 
the case, the claims of the H-W test to meas- 
ure the primary components of personality 
structure must be invalid since, proceeding 
from Rosanoff’s theory or any other theory 
of the same kind, one can hardly presume that 
some components would be represented by less 
consistent symptomatic patterns than others. 
Big differences in internal consistency among 
the scales will obviously mean that some 
scales measure mutually more connected 
symptoms than others and that the latter, 
probably, have been based upon superficial, 
vague, and ambiguous definitions of the com- 
ponents, definitions which partly, in the case 
of the H-W test, depend on the choice of 
clinical groups for the validation (3). 


The Statistical Basis for the Scale Analysis 


Our scale analysis was based upon a method 
originally developed by Likert (5). In this 
study we used the following procedure. From 
a group of 508 job applicants at two Swedish 
companies, including all male Ss tested by 
certified Humm-testers with the authorized 
version of the 1954—55 revision of the H-W 
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test, we selected the upper and lower quar- 
tiles with respect to the profile values for each 
component, i.e., the raw scores corrected for 
No-count and transformed into a 21-point 
scale. Since an exact 25% selection was im- 
possible with a discreet variation, we tried to 
get a total selection as close to 50% as pos- 
sible (Table 1). The next step in our scale 
analysis implied that we computed the aver- 
age value for each item in the two extreme 
groups. The choice alternative indicating the 
component was counted as + | and the other 
alternative as 0. If we subtracted the aver- 
age value for an item in the low group (M,) 
from its average value in the high group 
(My), we got the value of DP (the discrimi- 
natory power), or, an estimation of how 
strongly an individual item correlated with 
the entire scale. 

A DP value is naturally not the same as a 
conventional coefficient of correlation. But 
we know, on the other hand, that it can be 
regarded as a good estimation of that coeffi- 
cient (9). Moreover, we have to point out 
some weaknesses in the Likert method. The 
one-dimensionality of a scale can be measured 
reliably only by a factor analysis. A DP 


value, as a matter of fact, implies nothing 
but an estimation of the first centroid factor. 
The regression is practically linear (r > .80 
according to Ekdahl [2]; cf. also [6]). Con- 
sequently, a scale with high DP values may 
include items which are- mutually uncorre- 


lated. In view of the practical impossibility 
of factor analyzing this material, however, we 
decided to be content with a Likert analysis. 
This does not imply an unfair treatment of 


Table 1 


The Number of Ss in the High Group and 
Low Group for Each Component 


Component 





Group ) P 


% high-group 
N 


29.6 27.8 32.5 19.2 22.0 17.9 23.6 
149 140 164 97 111 90 119 
24.6 19.0 19.6 26.4 26.0 29.8 27.8 
124 9 99 133 131 150 140 


% low-group 
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the H-W instrument in the sense that the 
yardstick used for our criticism is too heavy. 
Instead, the demands of the Likert method 
on the internal consistency of the scales are 
rather too small; and it will, therefore, ex- 
onerate the scales from a number of existing 
inconsistencies which it cannot lay bare. 

Only when one response alternative has 
been chosen much more often than the other, 
the DP values may erroneously become too 
low. For this Ekdahl has proposed a correc- 
tion. We arrived at the original DP values 
by means of the following equation: 


DP = Mu — M1. 


But we computed, in addition, the best dif- 
ference possible with the choice distributions 
given in the entire group. Let us assume that 
the high group represents X% of the total 
group, and that Y% of it chose the alterna- 
tive indicating the component. In case Y is 
bigger than X, the best possible difference 
will be + 1.0. But if Y is less than X this 
figure will diminish at the same time as the 
difference between Y and X increases. The 
reverse reasoning applies to the low group: if 
the number of individuals selected for this 
group is bigger than the number of nonindica- 
tive choices, the best possible value will ex- 
ceed 0. We use my to stand for the best pos- 
sible value in the high group and m,, for the 
corresponding value in the low group. The 
corrected DP value was then computed in the 
following way. 


M nu — Mi 


DP, os ° 
Ma — ML 

If the number of Ss included in the ex- 
treme groups is less than the number of 
choices of both alternatives, the denominator 
will naturally be + 1.0. But in case distri- 
butions are so skew that the number of high 
group or low group Ss exceeds the number of 
choices of either the indicating alternative or 
the nonindicating one, the denominator will 
be less than + 1 (because my diminishes or 
my, increases). And DP, will increase in re- 
lation to DP. Thus a DP value for an item 
where choice distributions are skew can be 
improved by our correction. We have to un- 
derline, however, that even DP, values should 
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We also tested statistically the differences 
between the scales. A includes very high DP, 
values as well as very low ones. The left part 
of Fig. 2 indicates that this scale differs from 
all the others scales with respect to the devia- 
tion of these values. As far as concerns the 
average DP, values for the scales, N has a 
lower value than the “pathological” compo- 
nents. On the right part of Fig. 2, we can 
read that M differs from E, P, H, N; A and 
D from H and N; and E, P, H from N. 
These obvious differences, as we have said al- 
ready, imply a serious memento for the com- 
ponent theory, which presupposes that the 
scales represent logically and psychologically 
comparable components. We know that the 
final inclusion of items in the seven precon- 
ceived scales was motivated exclusively by 
the results of a validation study (3, 4). The 
homogeneity of a scale thus depended on how 
. the diagonisticians conceived and applied the 
~10 N H DA t component descriptions. Since the Rosanoff- 

DP, values for the seven components in the — ae eee — : - 
Humm-We dewesth Senipactindit Codie. tively unitary psychiatric syndromes (autistic, 

manic, depressive) and in part to mixtures of 
several principles of classification (paranoid, 
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not be generalized to stand for a sample with 
a lower No-count than this one (the number 
of No-responses was close to 200). It is also 
important to remember that DP, values for 
items where very few Ss have chosen one of 
the response alternatives are naturally very 
unreliable. 


Results of the Scale Analysis 


Figure 1 summarizes the first results for 
individual items in the component scales. 
There are two broken lines in the diagram, 
one for DP, values of .20 and one for values 
of 40. The first line is adapted to what 
Likert (5) considered to be the lower limit 
for DP values in a reasonably homogeneous 
scale (.75 for a five-point scale) and the 
second line to the more rigorous demands 
made by Ekdahl after his analysis of related 
problems (op. cit.). It is only too evident 
that the scales differ from each other with re- 
spect to internal consistency. While M and 
A may be considered fairly acceptable if 
Likert’s more lenient norms are applied, N 
appears to be very heterogeneous. 


epileptoid, hysteroid, normal) we were not 
surprised to learn that the internal consist- 
ency of the scales varied considerably. 


The Normal Component 


N is the most inconsistent and theoretically 
least acceptable of the scales. There are no 


M 
A 
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Fic. 2. Differences between the component scales in 
the deviation of DP, values and their means. 
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reliable DP, values above .40, and above .20 
only about 25% of them. Among items 
(Table 2) with high DP, values one may 
trace some correspondence in content. Many 
indicative choice alternatives seem to concern 
a good measure of self-reliance (255, 256, 
287), an emotional balance going with it 
(287, 109), and lack especially of paranoid 
defense mechanisms (279, 14, 255, 109). But 
several of the worst items aim at the same 
vague behavior complex (80, 155, 199, 302). 
One reason for the lack of homogeneity in 
the N scale seems to be that many questions 
are formulated in such an ambiguous manner 
that S is forced to choose his answer more or 
less at random; another reason is a rather 
transparent moralistic bias in some of the 
items which is apt to make an S’s choice in- 
consistent with responses to more neutral 
questions; a third reason may be the mixture 
of normal tendencies and more or less com- 
pulsive ones (32, 280, 94, 184, etc.). 


The Hysteroid Component 


This scale also belongs to the inconsistent 
ones. Only 4 DP, values are higher than .40. 
Most of them, however, exceed .20. The 
worst items in H, as a rule, have very little 
in common with hysteroid symptoms but 
rather with autistic (88) and paranoid ones 
(302, 42), withthenic working habits (197), 
or, with general norms of ethics which may 
differ markedly even between seriously “ethi- 
cal” individuals (234, 226). The best items 
in the scale aim at many typically hysteroid 
traits: an inclination to exhibitionist behavior 
(317, 245) and gambling (75, 273), a lack of 
objectivity and permanent values (186, 84, 
104, 284) together with a basic indifference 
and cynicism vis 4 vis other people (195, 165, 
27, 258). Several of them are projective (27, 
258, 214, 94, etc.). But the choice of indica- 
tive alternatives to these questions need not 
imply a hysteroid character. A scrupulously 
honest person will certainly affirm questions 
84 and 297 (instances of bad conduct and 
nonattendance in school), and a naively rough 
person 317 and 245 (practical jokes and so- 
ciability). Since the self-image of hysteroid 
people tends to be characterized by consistent 
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falsification one can hardly expect that they, 
among all people, will choose the indicative 
alternatives. 


The Manic Component 


There are only 3 DP, values below .20 in 
this relatively acceptable scale. Some of the 
most inconsistent items belong to a subscale 
intended to measure the degree of euforia. 
But this applies only to item 112 while other 
questions (267, 47, 156) refer to irritability 
and the rest of them (71, 119) to emotional 
adaptation and identification. On the whole, 
however, M consists of items aiming at cyclic 
emotionality and needs of contact. But since 
M and D share a great number of items we 
may ask why they have not been conjoined 
into one component. A preliminary analysis 
of the DP-values in M as computed on the 
basis of extreme groups in D suggested that 
the two scales were highly interrelated, as are 
the manic and depressive symptoms in psy- 
chopathology. 


The Depressive Component 


Even D belongs to the acceptable scales. 
Among items with low DP, values we find 
several without specific relations to depressive 
tendencies: shyness (3), impatience (304, 
139), aggression toward authorities (151), 
refusal to play in a new game (23), or, “Do 
certain animals make you nervous?” (35). 


The Autistic Component 


Here we find DP, values of all magnitudes, 
most of them high enough to be acceptable. 
Two of the least acceptable groups of items 
were summarized under such seemingly rele- 
vant headings as feelings of inferiority and 
narrow interests. But the choice of “yes” to 
39 (more intense feelings than in other peo- 
ple) need not be a sign of inferiority feelings; 
and a choice of “yes” to 23 (see above) or to 
151 (an inclination to oppose overbearing peo- 
ple) not a sign of autism. And items 249 and 
169 (difficulties in relaxing and a tendency to 
get stuck in details) would probably be more 
relevant in a scale measuring compulsive or 
asthenic symptoms. Items on the top of the 
scale of DP, values refer to more adequate 
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symptoms such as shyness, timidity, contact 
difficulties, etc. 


The Paranoid Component 


While most items fall above the .20 line, 
only three of them exceed .40. Most of the 
best items refer to paranoid characteristics as 
we usually understand them: jealousness, sus- 
piciousness, rigidity, contempt for the opinion 
of others. Projective formulations are not 
uncommon (24, 58, 148, 94, etc.). Toward 
the bottom of the scale, however, the unre- 
latedness of items becomes obvious. In what 
way do scurrying work habits refer to para- 
noic tendencies (48), or an S’s inclination to 
limit his human contacts (72). Some items 
seem to have been hampered by evaluating 
formulations (202, 226). Another plausible 
cause of the inconsistent response patterns of 
our Ss is the linking of aggression and self- 
assertion on the one hand with projective de- 
fense mechanisms on the other. The paranoid 
pattern may very well be passive and anx- 
iously submissive; and aggression, for that 
matter, is typical of many other neurotic com- 
plexes. 


The Epileptoid Component 


Close to one-third of all items got unac- 
ceptable DP, values. Those items with rela- 
tively high DP, values (we exclude a number 
of neurological questions where choice distri- 
butions are extremely skew) have very little 
in common except a certain prosaic attitude 
toward life. The central symptoms of the 
component have partly disappeared in the 
background: inspirational fixation to pur- 
poses, meticulous precision in their execu- 
tion, etc. An important reason why the epi- 
leptoid component is not more homogeneous 
might be that Rosanoff relied too heavily on 
the classical theory, now abandoned (1), that 
epileptics constitute a special group even tem- 
peramentally, a theory which guided the 
choice of the epileptoid group in the valida- 
tion study (3, 4). 


Summary and Conclusions 


This paper is a study ef the internal con- 
sistency of the seven scales in the Humm- 
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Wadsworth inventory carried out in a group 
of 508 male applicants for industrial work by 
means of a revised Likert analysis. Our re- 
sults show beyond doubt that the H-W test 
as a whole hardly fulfills even quite lenient 
demands for one-dimensionality. The manic, 
autistic, and depressive components tend to 
hang together, at least in parts, and might be 
improved if a number of items were refor- 
mulated or taken out of the scales. But those 
groups of questions which are supposed to 
measure the four remaining components can 
hardly be designated as scales. 

Some inconsistencies were perhaps stressed 
by our choice of sample. If the number of 
No-responses had been as low as in Humm’s 
samples (median around 167) it is possible 
that some DP values would increase. But we 
want to point out that all DP values were 
corrected for such a skewness in the distribu- 
tion of responses which has proved to be one 
consequence of a high No-count (8). There 
was, however, no way of curing those incon- 
sistencies in the choice of alternatives which 
were likely to appear in a group of job ap- 
plicants, i.e., a group where Ss, as we have 
shown in another study (8), tried to compose 
a more or less desirable image of themselves. 
But our task was not to study the H-W test 
under those optimal conditions where it was 
standardized but under those conditions for 
which it has been recommended—as a screen- 
ing tool. 

Moreover, our results indicate that the lack 
of internal consistency found in our study is 
hardly due to the choice of Ss but rather to 
the vague or ambiguous definitions of the 
components; the clumsy manner in which sev- 
eral items, otherwise often acceptable, were 
formulated; a frequently recurring moralistic 
bias distorting the neutral, exploratory aim of 
the inventory; and, last of all, the uncritical 
empiricism which guided the choice of items 
for the scales and allowed no further revisions. 


Received September 10, 1957. 
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The Strong VIB has established a firm 
place for itself in the assessment of vocational 
interests. Moreover, it has been found a 
useful source of information about additional 
dimensions of personality when subjected to 
clinical analysis in counseling and placement 
situations (6). It was felt that this informa- 
tion might be more systematically and objec- 
tively exploited by the development of new 
scales for the Strong. 

Level of adjustment, of general importance 
in the counseling or placement of an indi- 
vidual, was considered a dimension whose 
measurement by the Strong would be desir- 
able. In the phenomena of neurosis, theorists 
have given anxiety a central position, and em- 
pirically there is difficulty in differentiating 
measured “anxiety” from measured “neurot- 
icism” (12, 13, 24). Since the criterion 
measures used for the development of the 
scale reported upon here are considered meas- 
ures of anxiety and are significantly corre- 
lated with other measures of anxiety (3, 4, 
7, 14, 19, 20, 26), the present scale has been 
labeled anxiety. However, it is expected that 
its correlation with measures of “maladjust- 
ment” or “neuroticism,” etc., or variables un- 
derlying these complex phenomena will be as 
high as its correlation with other anxiety 
measures. 


Development of the Scale 


Both the theoretical relationship between 
neuroticism and anxiety and a reported cor- 
relation of .74 (13) between the Taylor MAS 
(22) and the Winne Scale of Neuroticism 


1This scale was developed by Garman in a dis- 
sertation carried out under the guidance of E. Lowell 
Kelly and submitted in partial fulfillment of the re- 
quirements for the degree of Doctor of Philosophy 
at the University of Michigan. 


(27) led to the decision to use a combination 
of these two MMPI scales as a criterion meas- 
ure. The eight items in common? to these 
two scales were represented only once in the 
criterion measure. The correlation between 
these two scales for the graduate psychology 
group used in the present study was .65 with 
the item overlap, .54 when common items 
were divided between the two scales. The 
former figure is closer to coefficients quoted in 
more recent studies (11, 16, 26) than it is to 
the .74 originally quoted by Holtzman, Calvin, 
and Bitterman (13). 

The Strong and MMPI answer sheets were 
available for approximately 400 Ss who had 
participated in an assessment study on the 
prediction of performance in clinical psychol- 
ogy reported by Kelly and Fiske (15). Most 
of the Ss were first-year graduate students in 
psychology. This pool of Ss was randomly 
divided and the first half was used for de- 
velopment of anxiety indices for the Strong. 
The second half was then used for cross- 
validation of these indices. In addition, in 
order to test the indices on a more hetero- 
geneous sample, Strong and MMPI answer 
sheets were obtained for 200 male entering 
freshmen from the University of Minnesota.* 

As might be expected, the distribution of 
Taylor-Winne scores had a high positive skew 
for both college groups. With a possible 
score range of 0 to 72, the mean and stand- 


2The eight items, as numbered in the MMPI 
booklet are: 43, 107, 186, 190, 191, 238, 242, 263. 
Other workers have referred to six items common 
to the MAS and the Winne. This may have been 
due to a confusion of the item numbers in Taylor’s 
“Biographical Inventory” and their numbers in the 
MMPI booklet. 

8 These data were obtained through the courtesy 
of Ralph F. Berdie and the Student Counseling Bu- 
reau of the University of Minnesota. 
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ard deviation for the graduate group were 8.2 
and 5.8, respectively. The highest score for 
this group was 35. For the freshman group, 
the mean was 14.3, the standard deviation, 
8.7. 

A variety of approaches was utilized in ex- 
ploring the Strong for anxiety indices, includ- 
ing empirical methods and methods based on 
a priori rationales, analysis of individual items 
and pattern analysis. The conventional item 
analysis method, utilizing high and low cri- 
terion Taylor-Winne groups, yielded the best 
measure. The resulting scale consists of 46 
responses to 33 items.* 

The possible score range for the new Anx- 
iety Scale is from — 22 to +24. For the 
freshman group, which probably constitutes 
the most appropriate reference group, the 
mean Strong anxiety score was — 4, the 
standard deviation, 5. The figures for the 
graduate group were very close to these. The 
split-half reliability of this scale, calculated 
for the cross-validation group of graduate 
students was .73. 

When Strong Anxiety scores of the graduate 
cross-validation group were correlated with 
criterion scores, a coefficient of .36 was ob- 
tained. Correction for attenuation due to 
unreliability of the scales raises this coefficient 
to .44. A validity coefficient of .42 with the 
Taylor-Winne score on the MMPI was ob- 
tained for the cross-validation group of col- 
lege freshmen. Correction for attenuation 
raises this to .51. 


Correlations with MMPI Scales 


In addition to its relationship to the cri- 
terion variable, some relationships between 
Anxiety Scale and other measures can be re- 
ported here. The MMPI scores of the fresh- 
man group were available and these corre- 
lated with the Strong Anxiety Scale as shown 
in Table 1. It should be remembered that 
the Anxiety Scale was developed with MMPI 
items as the criterion. 

It is perhaps not surprising that the highest 
correlation should be with the Psychasthenia 
Scale, since high enough correlations between 


4 The items composing the scale, with their scoring 
weights, are available from the Bureau of Psycho- 
logical Services, University of Michigan. 
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Table 1 


Correlations Between the Strong Anxiety Scaie and 
MMPI Scores 


200 College Freshmen 


MMPI Scale r 
F .26** 
L —.25** 
K —.22°* 
Hs .18** 
D 33° 
Hy —.12 
Pd .19** 
Mf .19** 
Pa Rn 
Pt 42"* 
Se a 
Ma .08 
Si 37** 





* Significant at the 5% level. 
** Significant at the 1% level. 


the MAS and the Psychasthenia Scale have 
been reported (2, 9) to make it appear that 


‘they are measuring essentially the same thing. 


As to the meaning which these relationships 
to MMPI scales lend to the Strong Anxiety 
Scale, one might postulate the sensitivity of 
this scale to a general constellation of: aware- 
ness of and attention to internal processes, 
both psychic and somatic, particularly when 
these are unpleasant or disturbing; a tendency 
to be aware of external presses and the emo- 
tional impacts of interpersonal processes in 
general; and a tendency to be frank in ad- 
mitting or reporting these phenomena. 


Discussion 


One crude attempt to place the relation- 
ships demonstrated here in a theoretical con- 
text might be somewhat as follows: Differ- 
ences may exist in people to the extent to 
which they are sensitive to external objects 
or events and to the internal representation 
of the impact of these objects or events. Also 
the internal response to the external environ- 
ment may be via thought and feeling processes 
and/or via somatic reactions. (For example, 
whereas one person may respond to psycho- 
logical threat with a feeling of fear or anx- 
iety, another may become aware that his 
stomach is “upset.” Our “anxiety” scales 
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have items that represent both.) To go one 
step further, given sensitivity or responsive- 
ness to external phenomena, one’s internal re- 
sponse may be a pleasant or an unpleasant 
one. If the internal response is unpleasant 
and one is aware of it, he can find items in 
an “anxiety” scale which will allow him to 
communicate this. Also, if his internal ex- 
periences have been more unpleasant than 
pleasant, he may come to fear them and be 
fearful, shy, and suspicious in interpersonal 
relationships (paranoia), may avoid rather 
than seek new experiences (lack of adven- 
turousness), refrain from pushing the further- 
ance of his own aims or desires in interper- 
sonal contexts (lack of dominance and en- 
durance), and wish that people would be 
kinder to him and act in such a way as to 
produce more pleasant internal experiences 
for him (need for succorance). 

It may be noted that the MAS, which 
served as part of the criterion measure for 
the Strong Anxiety Scale, was developed for 
use in tests of drive theory. It was assumed 
that MAS scores are related to “emotional 
responsiveness,” which, in turn, contributes to 
“drive.” Predictions on the basis of these 
assumptions have been confirmed in a num- 
ber of studies (23). 


Further Studies Employing the Anxiety Scale 


In the course of a continuing study of se- 
lection and evaluation of medical school stu- 
dents being conducted at the University of 
Michigan, a good deal of interesting infor- 
mation has been amassed about the Gar- 
man Anxiety Scale. Garman’s Scale seemed 
promising enough to be used for those Ss in 
the present medical study for which Strong 
blanks were already available. It gave an 
“anxiety score” at almost no cost, and helped 
tap a personality area that had already been 
demonstrated to be related to aspects of 
medical school performance by Eron (10) 
and Shoemaker and Rohner (18). 


Methodology 


Three groups of Strong blanks were scored 
on the Garman Anxiety Scale. First, in 1952, 
the entire medical school senior class (Class 
of 1952) and the entire group of applicants 
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(including those who were to make up the 
Class of 1956) were given the Strong. Second, 
as part of a national study conducted by the 
Association of American Medical Colleges, the 
Strong was readministered to the Class of 
1956, during their senior year. Thus we have 
Garman Anxiety scores for the Class of 1952 
seniors, and for the Class of 1956 as both ap- 
plicants and seniors, with an intervening pe- 
riod of four years between test and retest. 

The data for the Class of 1956 allows us 
to look at the test-retest reliability of the 
Anxiety Scale over a period of four years. 
The correlations between the Anxiety scores 
and all the other Strong scores, available for 
almost all keys at all three administrations, 
help us to set the Anxiety Scale into the con- 
text of other Strong scores and provide sug- 
gestive leads as to anxiety components of oc- 
cupational interests. (Since these scores come 
from the same pool of questions, there is 
built-in correlation to the extent that the 
same response may contribute to a score on 
several scales. However, this effect should be 
small—the Anxiety Scale is scored on only 33 
of the 400 Strong items.) Finally, a-large 
number of additional variables have been cor- 
related with the Anxiety scores, for purposes 
of the larger assessment study (for a pre- 
liminary report, see [25]). From these, the 
present report will cull only the most interest- 
ing relationships. These are of two types: 
(a) correlations between Garman Anxiety 
scores and Anxiety-related scores measured 
by other instruments (Cattell’s 16 P.F., Ed- 
ward’s Personal Preference Blank, Allport- 
Vernon-Lindzey’s Study of Values, and Mc- 
Quitty’s Health Questionnaire), (4) correla- 
tions between Anxiety, measured as a part of 
the admissions procedure before students were 
accepted, and performance in medical school 
during the subsequent four years. 

The three Anxiety Scale administrations 
will be designated: (a) 1956-Seniors (the 
Class of 1956 tested as seniors), (0) 1956- 
Applicants (the Class of 1956 tested as ap- 
plicants), and (c) 1952-Seniors (the Class of 
1952 tested as seniors). For the Class of 
1956, N = 112; for the Class of 1952, N 
= 116. The complete matrices from which 
the following data have been abstracted are 
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available at the Bureau of Psychological 
Services, University of Michigan, Ann Arbor. 
The findings here reported for the Class of 
1956 are for all members of the senior class 
in attendance at the time of test administra- 
tion except women and Negroes, who were 
eliminated frdm the sample to increase homo- 
geneity. Data were incomplete for some stu- 
dents, because of the medical school’s stag- 
gered schedule, but the one-fourth of the 
Class of 1956 that was not in attendance at 
the time of readministration of the Strong 
was an unselected group, so that no bias in 
sampling should have been introduced. 


Results 


The four-year test-retest correlation of the 
Garman Anxiety Scale was .51. When it is 
remembered that the two administrations 
were separated not only by four years, but 
also by very different sets toward taking the 
test (in 1952 the students were applicants 
actively seeking admission to medical school, 
in 1956 they were successful seniors partici- 
pating in a research program), this relatively 
long-term reliability compares favorably with 


Table 2 


Correlations Between the Garman Anxiety Scale and 
Selected Strong Occupational Interests 








Garman Anxiety 


Grad. 1952 1956 
Psychol. Medical Medical 
Students Seniors Seniors 





Occupational Interest 





Artist ° 46 
Mathematician 29 
Chemist ° 
Production Manager —36 
Personnel Director —38 
Musician 32 
Accountant —37 
Office Man —23 
Banker —i1 
Sales Manager —30 
Author- Journalist 39 
Interest Maturity —35 
Masculinity-Femininity —21 


—22 —35 





Note. Py 9 my 40% of the correlations between the 
Anxiety scale and the Strong scales were significant at the 
1% oe (ry = .18 for ‘thea sample of Psychology students; 
yr = .25 for sam Medical a, 

* This correlation is i a 
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Table 3 


Correlations Between the Garman Anxiety Scale and 
Selected Personality Variables Measured by 
the 16 P.F., the Personal Preference 
Blank, the Study of Values, and 
the Health Questionnaire 


Garman 
Anxiety 
(1958 Sonmene) 


Personality Variable 





16 P.F. Factors 
Emotional Stability 
Adventurousness —37** 
Paranoia 38** 
Hysteric Unconcern 19* 
Anxious Insecurity 43** 


-i19° 


Personal Preference Blank Needs 


33°° 
—40** 
— 24° 


Succorance 
Dominance 
Endurance 


Study of Values 


Economic 
Aesthetic 


— 19° 
40** 


. McQuitty Health ittiassneael 


* Significant at the if level. 
** Significant at the 1% level. 





reliabilities reported for similar personality 
questionnaires. 

Table 2 presents correlations, obtained both 
by Garman on his original cross-validation 
sample, and by Uhr and Kelly on the two 
groups of medical students, between the Anx- 
iety Scale and selected Strong Scales. 

We thus find a large number of significant 
correlations, with good agreement in most 
cases between the results on all three sam- 
ples. Anxiety as measured by the Garman 
Scale is positively related to Artistic inter- 
ests, exemplified by Artist and Author-Jour- 
nalist; and Scientific interests, exemplified by 
Mathematician and Chemist. It is negatively 
related to Business and Sales interests, ex- 
emplified by Production Manager, Banker, 
and Sales Manager. Further, Anxiety is 
negatively related to two of the personality 
variables measured by the Strong: Interest 
Maturity and Masculinity-Femininity. 

Table 3 presents the significant correla- 
tions between the Garman Anxiety Scale and 
related personality variables from the Cattell 
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16 P.F. (5), the Edwards Personal Preference 
Blank (8), the Allport-Vernon-Lindzey Study 
of Values (1), and the McQuitty Health 
Questionnaire (17). We find a number of 
personality variables significantly related to 
the Anxiety Scale. Further, the correlations 
with the Anxiety-related factors measured by 
the 16 P.F. and with the McQuitty argue for 
a certain amount of construct validity of the 
Garman Anxiety Scale. 

In the course of the total medical school 
study, five independent factors of Medical 
School achievement were identified, measured, 
and correlated with a large number of pre- 
dictor variables, including the Anxiety Scale 
based on the Strong completed four years 
previous to the measurement of the criteria. 
For one of these factors, “Medical Knowl- 
edge,” the Anxiety Scale predicted to a sig- 
nificant extent (r = .20, significant at the 5% 
level). This is the sort of low but significant 
relationship to be expected from previous 
studies, but two considerations make it of 
rather unusual interest. First, this is a pre- 
dictive relationship over four years. Second, 
of 52 variables measured on the Strong, Anx- 
iety was the only one that predicted “Medi- 


cal Knowledge” to a significant degree. 


Implications 


The data reported here provide additional 
information regarding the Garman Anxiety 
Scale for the Strong. We have indications of 
satisfactory long-term reliability (despite the 
very different situations at the time of the 
two testings). We have evidence of validity 
that stands up well in comparison with types 
and magnitudes of validity reported for com- 
parable tools. We find a number of extremely 
suggestive intercorrelations with a number of 
vocational interest and personality variables. 
And finally we find a significant low relation 
between Anxiety as a predictor and a cri- 
terion factor of medical school performance. 
The indications of correlates of Anxiety from 
among personality, interest, and performance 
variables bring us to questions beyond the 
scope of the present report, but they are of 
interest to us here because of their validating 
effect—the Garman Anxiety Scale is measur- 
ing something that is related to the some- 
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things being measured by other psychological 
instruments. 

Possibly of more importance than the data 
so far collected on the Garman Anxiety Scale 
itself is the support it gives for this sort of 
interrelating of our numerous psychological 
instruments. These results would seem to 
lead to three guiding conclusions for possible 
extensions of work in this area. First, em- 
pirically derived scales permit making more 
measurements with the same investment of 
testing time, thus contributing importantly in 
combatting the often crippling limitations on 
psychological research imposed by time and 
expense. Second, interrelations between the 
numerous instruments we have at our dis- 
posal will help us know whether we are really 
measuring something new. And third, we 
might entertain the interesting possibility— 
one that would seem desirable for some fu- 
ture era of psychology when we are able to 
systematize our items and tailor the appro- 
priate measuring instrument for each situa- 
tion—of a pool of questions adequately sam- 
pling all aspects of behavior rather than a 
proliferation of question forms developed to 
answer specific questions with no knowledge 
of the larger context of behavior. Measur- 
ing instruments for important variables could 
then be built up by proper weighting of the 
related items in the pool. Variables that 
could not be measured by the available pool 
of items would be used to enlarge the pool, 
ie., items associated with the new variables 
would have been demonstrated to be useful 
for the psychologist and would have earned 
their place in his repertoire of tools. 


Summary 


An Anxiety Scale for the Strong VIB has 
been developed by item analysis, using a com- 
bination of Taylor MAS and Winne Scale of 
Neuroticism items on the MMPI as the cri- 
terion measure. Four hundred graduate psy- 
chology students, randomly divided into two 
equal groups, were used as Ss. Scale items 
were chosen by means of an analysis of the 
first group and the Anxiety Scale validated 
on the second group. A test of the Anxiety 
Scale on a more heterogeneous sample was 
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made in a second validation using 200 male 
freshmen as Ss. 

The Anxiety Scale consists of 33 items on 
which 46 responses are scored. Split-half re- 
liability calculated for the first cross-valida- 
tion group was .73. The two cross-validations 
gave correlations with the criterion of .36 and 
42, which are raised by correction for at- 
tenuation due to unre! ‘ability of the scales to 
44 and .51. 

A number of significant correlations are re- 
ported for the cross-validation sample be- 
tween the Anxiety Scale, and MMPI and 
Strong VIB Scales. Data collected in a 
Medical School Assessment Project produced 
a number of interesting correlates of the Gar- 
man Anxiety Scale from among variables 
measured by the Strong VIB, the Cattell 16 
P.F., the Edwards PPI, the AVL Study of 
Values, and the McQuitty Health Question- 
naire scales. In addition to the vocational 
interest, personality factor, and values cor- 
relates of anxiety thus identified, several anx- 
iety scales on the 16 P.F. and the Health 
Questionnaire were found to be significantly 
related to Garman Anxiety. Four year test- 
retest correlation was .51. 


Received September 23, 1957. 
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Contextual Effects in Scaling * 
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Jones and Thurstone (7) have recently 
suggested a method for the selection of de- 
scriptive words and phrases for use as anchor 
points on subsequent successive interval pref- 
erence scales. One assumption underlying 
their method is that the empirical meaning 
of a word remains approximately constant 
within a particular contextual framework. Ac- 
cordingly, they propose that adjectives scaled 
within the context of “food” may be used in 
scaling specific food items without an ap- 
preciable change in empirical meaning. 

Although variation in the meaning of adjec- 
tives may be limited by restricting the con- 
text within which they are scaled to a generic 
term, “food,” there is still reason to suspect 
considerable variation in meaning when the 
same adjectives are later applied to specific 
items within that class. Helson’s studies (4) 
of the adaptation level concept indicate that 
the general frame of reference would vary be- 
tween lists of items when the lists are at dif- 
ferent preference levels. This suggests the 
possibility that the frame of reference also 
varies between specific items at different 
preference levels so that the empirical mean- 
ing of descriptive adjectives on a scale might 
vary considerably with the specific item be- 
ing rated. 

Somewhat related to this position is the 
finding of Hovland and Sherif (5) that the 
biases of judges rating potential scale items 
affects the scale values even within the rela- 
tively narrow context of “Negro.” Within 
the more general context of “food,” then, it 
might be assumed that the food biases of 
judges rating potential scale items would af- 
fect the scale values of those items. The pur- 
pose of the present study was, therefore, to 
determine the effect of specific contextual lev- 
els upon the empirical meaning of some de- 
scriptive words and phrases used in scale con- 
struction. More specifically, it was hypothe- 


1 The authors are indebted to E. R. Dusek, J. M. 


McGinnis, and W. H. Teichner for their helpful criti- ° 


cism of the initial draft. 
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sized that the scale values of adjectives rated 
in the context of “food’’ would increase sig- 
nificantly when rated in the specific context 
of a highly acceptable food and decrease sig- 
nificantly when rated in the context of an un- 
acceptable food. 

Basic to the study of contextual effects is 
the determination of the consistency of em- 
pirical meaning over repeated ratings within 
the same context, as inconsistency might ob- 
viate any examination of changes due to vary- 
ing contexts. Several studies have indicated 
repetitional factors that might lead to changes 
in empirical meaning. Thurstone (9) has 
suggested that repetition results in less dis- 
crimination due to boredom, with a subse- 
quent increase in the dispersion of scale 
values. Jones (6) obtained results which 
corroborate this assumption. His studies in- 
dicated that repeated administration results in 
an increase in the error variance with no 
appreciable change in the level of response. 
With regard to the average level of empirical 
meaning, Guilford (2) has assumed, based on 
Helson’s adaptation level studies, that the 
total range of stimuli to which the S is ex- 
posed during the experiment influences the 
level of his responses. Thus, the S, during 
the experiment, develops a “central standard 
level” as a frame of reference for all of his 
judgments. Such an effect might be supposed 
to be almost immediate when a comparatively 
small number of items is used, e.g., Jones 
(6). However, when a fairly large number 
of items are involved, the composite standard 
might vary during the first administration as 
items at different preference levels are en- 
countered and on repeated administrations as 
increasing numbers of items are recalled. 
Since the present study used a large number 
of items, it was necessary first to determine 
the consistency of empirical meaning over re- 
peated administrations so that the effects of 
changing context could be more validly ex- 
amined. 
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Method 
Subjects 


The Ss were 145 female clerks, secretaries, and 
stenographers at the QM Research & Engineering 
Center, Natick, Massachusetts. 


Scale 


The 51 words and phrases and the nine-category, 
successive-interval schedule used by Jones and Thur- 
stone (7) provided the basis for S’s judgments. 
Four forms of the questionnaire, each containing the 
same words and phrases in different random orders, 
were used. Forms A and B contained identical in- 
structions: “In this test are words and phrases that 
people use to show like or dislike for food. For each 
word or phrase make a check mark to show what 
the word or phrase means to you.” The instructions 
for Forms C and D were changed only by substitut- 
ing the name of a particular food item in place of 
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the word “food”; Form C contained the words 
“roast beef” and Form D the words “stewed kid- 
neys.” Each of the forms included appropriate ex- 
amples. 


Procedure 


Form A was administered individually to all Ss 
during the first week of the study and Form B to 
the same Ss one week later. Two weeks after filling 
out Form B, one half of the Ss completed Form C 
and the other half, Form D. 

Several Ss were omitted for administrative reasons 
during the course of the experiment. In addition, 
Ss showing inconsistent performance, based on the 
criteria established by Jones and Thurstone (7), 
were eliminated from the study. 

The psychophysical method presented by Edwards 
(1) for scaling by successive intervals was used to 
determine the scale values of the items on each ques- 
tionnaire. Words showing cumulative proportions of 


Table 1 
Rater Reliability and Algebraic Deviations of Scale Values 








Rater Reliability 
(m = 129) 


Algebraic Deviations 
(n = 35) 





Lower 


Word Bound* 





Food to 
“Kidneys” 


Food to 
“Beef” 


Upper 
Bound 





. Neutral 

Despise 

. Loathe 

. Best of all 
Dislike extremely 
Dislike intensely 
Excellent 

. Favorite 

. Like intensely 

. Dislike slightly* 

. Like slightly* 

. Mildly like* 

. Dislike very much 
. Strongly dislike 

. Like extremely 

. Terrible 

. Mildly dislike* 

. Wonderful 

. Like quite a bit* 
. Like very well* 

. Very bad 

. Especially good* 
. Like very much 

. Highly unfavorable 
. Good* 


1 
2. 
3 
4 
5. 
6. 
7. 
8 


+.17 


+.38 

+.33 — .02 
+.33 — .88 
+.43 —.26 
+.38 —.36 


+.08 +.07 





* Items used in scale value comparison of Form A with Form B, 
> Words without scale — fell in extremes and could not be scaled. + values indicate a positive change and — values 


a negative change in scale val 
¢Anr of oa} fs necessory for digaificance at the .01 level. 
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Table 1—Continued 





Rater Reliability 
(n = 129) 





Algebraic Deviations” 
(n = 35) 





Lower 


Word Bound* 


26. Highly favorable* 57 
27. Like moderately* 57 
28. Like fairly well* 57 
29. Very good* 56 
30. Strongly like* 

31. Dislike moderately* 

32. Bad 

33. Average* 

34. Fair® 

35. Like* 

36. Acceptable* 

37. O. K.* 

38. Mighty fine* 

39. Not pleasing* 

40. Enjoy 

41. Pleasing* d 
42. Tasty* 45 
43. Only fair* 42 
44. Like not so much* 42 
45. Poor Al 
46. Preferred* 40 
47. Dislike 40 
48. Welcome* 40 
49. Don’t like 39 
50. Don’t care for it* 38 
51. Like not so well* 32 


Food to 
“Beef” 


Food to 
“Kidneys” 


Upper 
Bound 








response in excess of .50 in the extreme categories 
could not be scaled by this method and were omitted. 
No attempt was made to derive scale values by 
graphic methods, since they were not deemed exact 
enough for the purposes of this study. After omit- 
ting nonscalable items, 31 items were available for 
the scale value comparison of Form A with Form B 
and 35 items for the comparisons of Form B with 
Forms C and D. 


Results and Discussion 
Consistency of Response 


The primary purpose of the first phase of 
this study was to determine the consistency 
of the raters’ responses over repeated adminis- 
trations and the effect of repetition upon the 
empirical meanings of the adjectives. To 
determine the consistency of the raters’ re- 
sponses from Form A to Form B, ordinal 
numbers were assigned to the nine-point scale 


72 _ —_ 

69 — .48 —.79 
61 —.17 —.45 
.66 +.40 — 43 
.66 +.61 —.37 
71 +.17 —.67 
69 —.12 
—.37 
—.47 


d +.35 
65 +.69 
71 +. .36 
70 —.15 
69 +.45 
.67 —.30 
65 +.34 
58 +.45 
.63 +.41 
56 +.13 
65 .00 
62 +.45 
.52 +.12 
.63 +.18 
.62 +.72 
62 +.32 
.58 — 39 
60 +.37 
.60 +.13 
52 +.21 


+.34 
—.21 





upon which the raters made their judgments 
and Guttman’s (3) analysis for qualitative 
data was applied to obtain the upper and 
lower bounds for the reliability of each of the 
51 adjectives. The resulting coefficients are 
presented in Table 1. It will be seen that, 
while the lower bound coefficients are all sig- 
nificantly greater than zero, the majority are 
sufficiently low to indieste considerable in- 
consistency in rating any one adjective. This 
inconsistency might be d '> (a) ambiguity 
of the subjective meaning cv the adjectives, 
(5) an increase in the number of random re- 
sponses on the second administration due to 
boredom, or (c) variations in the level of 
response due to changes in the composite 
standard. 

Jones (6) has demonstrated a normal error 
distribution for repeated ratings of items, so 
ambiguity of the subjective meanings would 
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not be expected to affect the resulting unit of 
scale measurement. However, Jones found 
that an increase in the number of random 
responses over repeated administrations re- 
sulted in an increase in the unit of measure- 
ment. To determine the effect of this factor 
in the present study, the covariation of the 
scale values from the two forms was studied. 
Scale values for the adjectives based upon the 
distributions of response on Form B were 
plotted against scale values based upon dis- 
tributions of response on Form A as shown in 
Fig. 1. The coefficient of correlation is .99 
and the values do not deviate significantly 
from a linear function. The slope, 1.22, of 
the line of best fit indicates that the unit of 
measurement based upon the responses on 
Form B is about 20% larger than on Form A. 
This is consistent with Jones’ (6) finding of 
an increase in size of unit of measurement. 
To test for a change in the level of the re- 
sponses that might be attributed to changes 
in the composite standard of the raters, a ¢ 
test for related measures (8) was calculated 
for the scale values of the two forms. The 
resultant ¢ of 1.51 with 30 degrees of freedom 
was not significant at the .05 level of signifi- 
cance. Thus, on the basis of these tests one 
might conclude that further administrations 
within the same context could lead to fur- 
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SCALE VALUE ON FORM A 


Fic. 1. Scale values based upon responses to form 
B plotted against scale values based upon responses 
to form A. 
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Fic. 2. Scale values obtained under specific con- 
texts of “roast beef” (Form C) and “stewed kidneys” 
(Form D) compared with scale values derived under 
the general context, “food” (Form B). 


ther increases in response dispersion but not 
changes in response level. 


Contextual Effects 


To examine the contextual effects, scale 
values obtained from the responses of Ss un- 
der the specific context “roast beef” were 
compared with the scale values derived from 
the responses of the same Ss under the con- 
text of “food”; a similar comparison was 
made for Ss responding under the specific 
context of “stewed kidneys.” These relation- 
ships are plotted in Fig. 2. Neither relation- 
ship deviated significantly from a linear func- 
tion. In both cases, the correlation coeffi- 
cient was .99 and the slope of the line of best 
fit was approximately 1.0, indicating that the 
unit of measurement was approximately equal 
for the three forms.’ 

To test for differences in the mean level of 
response due to changes in the contextual ori- 
entation of the raters, algebraic deviations of 
the scale values were obtained between the 
control and test conditions. The deviations 

2 Such a finding cannot be interpreted as indicating 
that the boredom effect remains constant after the 
second administration. The introduction of a more 
specific context may have decreased the ambiguity of 
the descriptive phrases thereby interacting with in- 


creased randomness of response to negate any pos- 
sible change. 
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are presented in Table 1. It can be seen that 
the majority of the deviations are in the pre- 
dicted direction. A ¢ test for related meas- 
ures (8) was calculated for each comparison. 
Both #t’s (4.58 and 6.17 with 34 degrees of 
freedom) are significant at the .01 level of 
significance indicating verification of the hy- 
potheses. 

While a number of scaling procedures as- 
sume that the empirical meaning of an adjec- 
tive remains fairly stable within a particular 
context, the results of the present study sug- 
gest that this assumption may not always be 
valid. Thus, within the “restricted” context 
of “food,” the further restriction of context 
by the introduction of specific foods has been 
demonstrated to increase, or decrease, the 
scale values by a constant amount; this 
change seems to be directly related to the 
level of acceptance of the contextual food. 
This finding suggests that an empirical de- 
termination of the extent to which a context 
is restricted may be necessary before valid 
application of a derived scale can be made. 
It further suggests that when a scale derived 
in a particular context, e.g., “food,” is ap- 
plied as a continuous interval scale to the 
rating of seemingly homogeneous items, e.g., 
specific foods, the resultant scale values may 
be inaccurate due to the raters’ biases con- 
cerning the rated items. 


Summary 


The study was designed to determine the 
effect of specific contextual levels upon the 
empirical meaning of adjectives used in scale 
construction. It was hypothesized that the 
scale values of adjectives rated in the con- 
text of “food” would increase significantly 
when rated in the more specific context of a 
highly acceptable food and decrease signifi- 
cantly when rated in the context of an unac- 
ceptable food. 
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Four forms of a scale, identical in content 
but varying in order of words, were adminis- 
tered to 145 female Ss. All Ss received Forms 
A and B, which were in the context of “food,” 
one week apart. Two weeks later, one half of 
the Ss received Form C in the specific context 
of “roast beef” and the other half Form D in 
the specific context of “stewed kidneys.” 

Analysis of the results indicated that the 
hypotheses were verified. 

Findings regarding consistency of responses 
are also presented and discussed. 


Received July 8, 1957. 
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The situational test involving small groups 
of Ss has made marked gains in popularity in 
the course of the last few years. Certain basic 
problems exist, however, for which we must 
find better solutions than we now have before 
the small group situational test can be used 
with confidence as a measurement tool. Per- 
haps the most important problem is concerned 
with the degree to which characteristics of the 
individual S determine the S’s role in small 
group activities. The “great man” theory of 
small group leadership considers the charac- 
teristics of the individual to be a major factor 
(1, 3). Other theories emphasize the nature 
of the small group situation or the interaction 
with other group members as the major vari- 


able in small group behavior (2, 4, 5, 7, 8, 9, 


10). If the “great man” theory holds even 
to a limited degree (that is, when type of 
problem and composition of group are varied 
only within moderate limits) it would permit 
the prediction of certain aspects of the S’s 
small group behavior such as leadership emer- 
gence without knowing the precise nature of 
the small group situation or the specific indi- 
viduals making up the remainder of the group. 
This would be particularly advantageous in 
military selection and classification where the 
“tailor making” of a group is not generally 
feasible. 

Implicit in the above are several hypotheses 
concerning certain aspects of small group be- 
havior and situational testing. The present 
study was designed to test the following spe- 


1The experimental work for this study was car- 
ried out under the Air Force Personnel and Training 
Research Center in support of Project 7719, Task 
17008. Permission is granted for reproduction, trans- 
lation, publication, and use or disposal in whole or 
in part by or for the United States Government. 


cific hypotheses, which are stated as questions 
to provide greater clarity: 


1. Is small group leadership behavior as 
measured by observer ratings related to per- 
sonality characteristics as measured by per- 
sonality trait ratings? 

2. Are there differences in the levels of re- 
lationships between personality characteristics 
and leadership scores on two different types 
of situational tests? 

3. Are relationships between personality 
characteristics and leadership scores on situa- 
tional tests a reflection of overall halo or do 
they indicate that specific personality charac- 
teristics are related to small group leadership? 

4. Do the patterns of relationships be- 
tween personality characteristics and leader- 
ship scores on two different types of situa- 
tional tests differ significantly? 

5. Does the degree or the pattern of the 
relationship between ratings of personality 
characteristics and leadership scores change 
significantly when the type and amount of 
contact between the rater and the subject is 
changed? 


Method 
Subjects 


The data analyzed in the present investigation 
were gathered in connection with an assessment of 
all male members of USAF Officer Candidate School 
Classes 1955B and 1955C. The study is based on 
125 male graduates of Class 55B and 96 male gradu- 
ates of Class 55C for whom complete test data were 
available. Before entering OCS the Ss had been 
screened in such a way that all were at least high 
school graduates with at least one year of enlisted 
military service. Eighty-five per cent of the group 
would fall in the upper 10% of the general popula- 
tion with respect to general intelligence, and the ma- 
jority were planning on a career in the Air Force. 
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Procedure 


As the Ss in each class reported to the school 
(55B in late December 1954 and 55C in March 
1955) they were assigned to six-man teams, in such 
a way that no member of a team had been ac- 
quainted with any other team member prior to re- 
porting to OCS. Fellow team members were in very 
close contact during the three and one-half day test- 
ing period and were isolated from other teams, ex- 
cept that they shared a barracks floor and ate with 
members of one other team. Each team was ad- 
ministered the group performance tests, which make 
up two of the variables in the present study. 

At the close of the testing period, the Ss were re- 
grouped into OCS flights of about 20 candidates. 
Each flight was supervised by a Tactical Officer and 
by a corresponding flight of OCS upperclassmen. 
Members of a flight were quartered together and 
attended classes and participated in other types of 
training as a unit. In reforming the assessment 
groups into OCS flights, care was taken so that no 
two members of any assessment group were assigned 
to the same OCS flight. 


Variables 


Small group leadership tests. 1. Project X Leader- 
ship Test. This small group leadership test con- 
sisted of 12 situational construction problems. These 
problems all required cooperation of team members 
in developing and carrying out a solution, and pro- 
vided a situation in which leadership behavior could 
be displayed. In a typical problem, the team was 
placed in a prison compound and required to escape 
across a moat and over a solid board fence. A ladder 
was provided but it was not long enough to reach 
from the edge of the moat to the top of the fence. 
Several short lengths of rope were also provided. 
Certain areas, such as the moat, were painted red 
and could not be touched. The problem was solved 
by holding the ladder at an angle over the moat 
with the ropes. One team member climbed the lad- 
der and jumped to the fence. Once over the fence 
he discovered additional props which could be used 
to help the remainder of the team to escape. A time 
limit of 12 minutes of actual working time was im- 
posed on each problem, which was not usually suffi- 
cient to complete the problem. Solution of the prob- 
lem was not necessary, however, as the rating scales 
used were directed toward behavior of team mem- 
bers rather than solution of the problem. The prob- 
lems were administered by Air Force officers attached 
to the Officer Candidate School. Two officers were 
assigned to each team to administer the first six 
problems. At the end of six problems, the officers 
were shifted and the last six problems were adminis- 
tered by two other officers. Six upperclassmen were 
assigned to each team to act as raters. A rater was 
assigned to each S in the team, and rated only this 
S on a given problem. At the end of each problem, 
raters were rotated, so that each S was rated twice 
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by each rater in the course of completing the 12 
problems. 

The rating device used was a behavior check list 
consisting of 11 categories, each concerned with one 
aspect of leadership behavior. The rater was asked 
to place a check mark for the man he was observing 
each time the man displayed one of the specific be- 
haviors in the check list. Scores were obtained for 
each S by summing the checks he received across all 
12 problems. 

Each of the four officers supervising the team dur- 
ing the 12 problems made independent general rat- 
ings of leadership ability based on their observations 
of the Ss. These ratings were combined with the 
check list rating scores to obtain the Project X lead- 
ership scores used in this research. This score has 
an estimated reliability (based on rater agreement) 
of .80. 

2. Leaderless Group Discussion. This test was 
used and scored in essentially the form outlined by 
Bass (1). A topic (In the event of all-out mobiliza- 
tion, should the Air Force rely on the AFROTC pro- 
gram or a greatly expanded OCS for its major source 
of new officers?) was assigned to each group. Thirty 
minutes were allowed for discussion. Four observers 
rated each S on each of nine leadership behavior 
items, using a five-point rating scale. 

Analyses indicated that the four raters agreed mod- 
erately well on the ratings of the separate items. 
The items were found to intercorrelate fairly highly 
indicating the presence of a large general factor. A 
total Leaderless Group Discussion score was com- 
puted for each S by summing the ratings he received 
across all nine items and all four raters. Based on 
the agreement between raters, the total LGD score 
had an estimated reliability of .82. 

Personality ratings. Ratings on a number of bi- 
polar personality traits similar to those of Cattell 
(6) were obtained on each S on three occasions. 
The first set of ratings was obtained from two upper- 
classmen who observed the S’s behavior in six role- 
playing interpersonal problem situations dealing with 
military administration, management, and human re- 
lations. The total time of observation on which the 
personality ratings was based was less than one hour. 
These ratings, designated below as the 1-Hour rat- 
ings, had an estimated average reliability of 35 based 
on the agreement between the two raters. 

A second set of personality ratings was made by 
members of the S’s 6-man team and by members of 
the team with whom he had shared the barracks 
floor, at the end of the 34-day assessment period. 
These ratings are designated hereafter as the 3-Day 
ratings. Their average reliability, as estimated from 
the agreement among the 12 raters, is .45. 

The third set of personality ratings was obtained 
at the end of the six months’ OCS period. All mem- 
bers of each flight (15 to 20) rated each other. These 
ratings are called the 6-Month ratings. Their aver- 
age reliability, based on the agreement among raters, 
is about .85. 

These three sets of ratings are experimentally in- 
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dependent since no rater rated any S on more than 
one set of ratings. Further, raters on any one set 
of ratings had no opportunity to observe their Ss in 
the behavior situations on which any other set of 
ratings was based. Thus, the 6-Month raters had 
not observed their Ss during the assessment nor had 
the 3-Day raters observed their Ss during the situa- 
tional problems on which the 1-Hour ratings were 
based, etc. 

Based on the intercorrelations among similar rat- 
ings obtained on several earlier OCS classes, the rat- 
ings of the present study were combined into 12 
clusters, containing from one to seven variables each. 
These clusters are listed below. 


. General Adjustment. 

. Extroversion. 

. Effective Intelligence. 

. Determination. 

. Assertiveness. 

. Social Maturity. 

. Lack of Neuroticism. 

. Unconventional-Conventional. 
. Attentiveness to People. 
. Insistently Orderly. 

. Adaptability. 

. Energetic. 


Coonaunt WN 


Analysis of Data 


Pearson Product-Moment correlations were com- 
puted between the scores on each situational test and 
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the three sets of ratings on each personality trait 
cluster. These 72 correlation coefficients were con- 
verted to Fisher’s z equivalents, and the specific null 
hypotheses tested by a triple classification analysis 
of variance. 


Results 


In Table 1 are shown the Fisher’s z equiva- 
lents of the correlations between the Project X 
and LGD scores and each trait variable as 
rated under each rating condition. These re- 
lationships are not independent so that a com- 
parison of the obtained distribution of 2’s 
with that expected by chance is not appro- 
priate. However, the fact that 36 (one-half) 
of the 2’s are significant at or beyond the 5% 
level, and the fact that the average z based on 
the whole table is significant at the 5°% level, 
make it fairly safe to give an affirmative an- 
swer to the first hypothesis. It may be con- 
cluded that some relationship exists between 
personality characteristics and performance 
on small group leadership tests. 

Table 2 presents the results of the analysis 
of variance designed to test the remaining 
hypotheses. 

The significance of the main effects of prob- 


Table 1 


Relationships Between Project X and LGD Leadership Scores and Personality Trait Ratings 
Obtained Under Several Conditions 





LGD Scores 





Project X Scores 





Rating Condition 


1-Hour 


Rating Condition 








Personality 


Variable 3-Day 6-Month 


1-Hour 3-Day 6-Month 





. Adjustment —09 —06 —13* 
. Extroversion 11 a 35** 
. Intelligence 00 10 04 

. Determination 04 io** —(O4 11 10 13° 
Assertiveness 15* 30** 27** 19** aad ag°° 
. Social Maturity 07 48** 11 15* 31** 32°* 
. Lack of Neuroticism 04 24** 05 12 09 14* 
. Conventionality 00 16* 14* —02 11 10 

. Attentiveness 06 OF —01 13* 16* 09 

. Orderliness 15* 08 —07 14* 06 02 

. Adaptability —07 05 03 18** 03 18** 
. Energetic 13* 27** is? = si** 18** 


18** 07 13* 16* 16* 


—02 00 06 
21** 22** 22** 
19** 28** 7 


CeONAUn PWN 


Average 05 





Note.—Relationshipe expressed as Fisher's s equivalents (with decimals omitted) based on product-moment correlation 
coefficients. 
* Significant at tle 5% level. 
** Significant at the 1% level. 
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Table 2 


Analysis of Variance of Relationships Between 
Leadership Scores and Personality 
Trait Ratings 


Variance 


Source df Estimate F r 


Problem Type 1 465.1 . 05 
Rating Type 2 380.8 5. 05 
Trait Variable 11 456.2 001 
Problem X Rating 195.0 ’ NS 
Problem X Trait 67.3 

Trait X Rating 28.2 

Problem X Rating X Trait 70.3 


Total 


lem type and trait variable confirms the sec- 
ond and third hypotheses. It appears that 
the level of the relationship between person- 
ality trait ratings and situational test per- 
formance is a function of both the kind of 
problem and of the specific traits rated. 

The main effect of rating type was also 
found to be significant, indicating that the 
level of correlations between triit ratings and 
situational test performance is also a func- 
tion of the rating conditions. 

None of the three first order interactions 
were significant, indicating that the patterns 
of the correlations between personality traits 
and situational test performance do not differ 
significantly when either the problem type or 
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rating condition was varied. Further evi- 
dence of a positive nature relating to the simi- 
larity of the patterns of the relationships is 
presented in Table 3, which shows the inter- 
correlations of the patterns of correlation co- 
efficients. 


Conclusions 


The results of this study have led to the 
conclusion that the personality traits associ- 
ated with successful performance in two types 
of small group activity do not differ in rela- 
tive importance, although there is evidence to 
show that, overall, the personality character- 
istics employed in this research are more 
highly related to the Leaderless Group Dis- 
cussion than to Project X. This is especially 
true in view of the fact that Project X, which 
consists of 12 construction-escape problems, 
apparently provides an opportunity to ob- 
serve a greater range of behavior character- 
istics than does the LGD which consists al- 
most entirely of verbal (oral) activity. It 
should not be concluded from these results 
that the two types of problem yield equiva- 
lent scores or that, apart from personality 
characteristics, the same abilities are required 
for successful leadership in the two situations. 
The obtained correlation between the Project 
X and LGD scores was only .34, indicating 
that the two have only a small amount of 
variance in common, and a considerable pro- 
portion of specific variance which has been 


Table 3 


Intercorrelations Among Trait-Problem Correlation Patterns for the Three Rating Conditions 


Correlation Pattern 
of Traits 


Project X Versus 

1. 1-Hour Ratings 
2. 3-Day Ratings 
3. 6-Month Ratings 


Leadertess Group Discussion Versus 
4. 1-Hour Ratings 
5. 3-Day Ratings 
6. 6-Month Ratings 
7. Sum of 1 through 6 





Rho Correlation Coefficients* 


4 5 6 





55 ‘ d .76 
51 71 ‘ 88 
.26 . mK 81 


74 
88 
91 
xX 





* Obtained by rank-ordering the 12 correlation coefficients between each problem score and the 12 personality characteristics 
as rated under each condition, and then computing the rho'’s between these rank orders. 
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found in other analyses of the present re- 
search data to be differentially related to 
athletic-physical ability in the case of Project 
X and to general intelligence, intellectual in- 
terests, and verbal ability in the case of the 
LGD. 

Significant differences were also found in 
the level of the correlations between the prob- 
lem scores and the personality characteristics 
which could be attributed to the source of the 
personality ratings. The 1-Hour ratings cor- 
related significantly lower than the 3-Day rat- 
ings but not significantly lower than the 6- 
Month ratings. The 3-Day and 6-Month cor- 
relations did not differ significantly. These 
differences appear to be mostly a function of 
differences in the reliabilities of the three 
types of ratings, and would probably disap- 
pear if the correlations were corrected for un- 
reliability of the ratings. 

Other analyses have indicated that the trait 
variables do not differ appreciably in reli- 
ability within any type of rating condition. 
Thus, the obtained significant differences in 
their correlations with the small group leader- 
ship problems are probably a function of their 
relative importance to success in these prob- 
lems. The last column of Table 1 shows the 
average relationships of each trait variable 
across both types of problem and all rating 
conditions. Four traits (Extroversion, As- 
sertiveness, Social Maturity, and Energetic) 
are significantly correlated at or beyond the 
1% level, and Effective Intelligence is signifi- 
cantly correlated at the 5% level. These 
traits appear to have somewhat of a social 
orientation, whereas most of the traits which 
are not significantly correlated (Adjustment, 
Determination, Orderliness, and the others) 
appear to be more personally oriented. 

With respect to the “great man” theory, 
these data seem to confirm that to some ex- 
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tent, leadership in a small group situation de- 
pends upon the characteristics brought to the 
situation by the Ss themselves, although the 
nature of the problem situation, and, per- 
haps, the nature of the group, will also help 
determine which Ss are seen as the leaders. 
Personality requirements for leadership in dif- 
ferent types of situational tests are similar 
with respect to patterns of personality char- 
acteristics, although not in level, and account 
for most of the common variance in this re- 
search. 


Received September 30, 1957. 
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Wrapper Influence on the Perception of Freshness in 
Bread 


Robert L. Brown 


Furman University 


In previous consumer research (1) it has 
been found that the two properties looked for 
in bread by the majority of people are fresh- 
ness and flavor. Freshness is determined 
largely by “feeling” of the loaves of bread 
and flavor is determined by taste. At the 
present time, breads are being wrapped in a 
number of different type wrappers ranging 
from a cellophane to a heavy wax or plastic. 
There are perhaps both advantages and dis- 
advantages to each type of wrapper. Persons 
engaged in the marketing of bread have often 
reported the opinion that some breads sell 
better in one wrapper than in another. Many 
hypotheses have been made with reference to 
this difference, but little has been done in the 
way of controlled research. This is the sec- 
ond in a series of proposed studies on the in- 
fluence of the wrapper on the perception of 
freshness in bread. 

In the original study conducted at Purdue 
University, 1955 (2), it was hypothesized 
that the tactual sensations aroused by the 
wrapper influence the perception of freshness 
in bread. More specifically, it was hypothe- 
sized that two loaves of equal freshness, but 
with different type wrappers, would be judged 
to have differential degrees of freshness. 

In testing the above stated hypothesis, four 
different type wrappers were selected. These 
were: cellophane; Saran; regular wax; and a 
special wax with a subwrapper. The experi- 
ment was performed in a laboratory situation 
in which 16 male and 16 female students 
were used as Ss. Sixteen loaves of fresh 
bread, all baked together during the night 
before the experiment, were rewrapped in the 
various wrappers by the experimenters. The 
Ss were seated parallel with the side of a 
table with their right arm around behind a 
screen. The S was instructed that this was 
an experiment to determine whether or not 
people can tell how fresh bread is by feeling 
it; that one loaf and then another would 


be presented under the S’s hand; that he 
should feel the one and then the other and 
tell the experimenter which of the two was 
fresher. No equal judgments were allowed. 

A full paired-comparison design was used 
and the pairs were randomly presented. The 
responses of the Ss were recorded on indi- 
vidual record sheets for the purpose of subse- 
quent analysis. 

From the analysis of the data, no signifi- 
cant differences were found between sex, be- 
tween sequence of presentation, between first 
and second halves of groups feeling the same 
set of loaves; or between the first and second 
halves of the total group during the experi- 
mental day. The difference between the ob- 
served and expected frequencies of judgments 
for the four wrappers gave a chi-square value 
of 26.38 which is significant beyond the 1% 
level with 3 degrees of freedom. The per- 
centage of judgments of “fresher” made by 
these 32 Ss were as follows: Cellophane, 
68%; Saran, 56%; regular wax, 42%; and 
the special wax with subwrapper, 34%. The 
percentages were determined on the basis of 
the number of judgments in favor of a par- 
ticular wrapper over the total number of 
times that wrapper appeared in the judgment 
pairs. Since a full paired-comparison design 
was employed for the four wrappers, the per- 
centages add up to 200%. 

The original study left a number of ques- 
tions unanswered and raised some other ques- 
tions. The purpose of this second study is to 
determine the answers to two of these ques- 
tions: (a) Will the same differential influence 
of wrappers on the perception of freshness in 
bread also be found among the primary con- 
sumer group—housewives—as with university 
students and (6) Will the same differential 
influence of the wrappers on the perception 
of freshness in bread hold for one- and two- 
day-old bread as it does for fresh bread. 
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Procedure 


In order to answer these questions, three con- 
ventional type wrappers were selected: cellophane; 
cellophane with a five-inch waxed paper insert band; 
and wax. The cellophane wrappers were the com- 
monly used .001-inch thick and weighed approxi- 
mately one pound per 21,000 square inches. The 
waxed paper for the wrapper and for the insert band 
was of base paper weighing 25 pounds per ream and 
waxed up to 37 pounds per ream. All wrappers 
were unprinted. 

Eighteen regular, one-pound, round-top, sliced 
loaves of white bread were used. Six of these were 
fresh, having been baked at the same time on the 
afternoon before the experiment; six were one day 
older; and six were two days older. All of the loaves 
were stored in their original wrappers until a few 
hours before the experiment when they were re- 
wrapped in the various wrappers for the experiment. 
The wrappers were adjusted to a degree of tightness 
(or looseness) judged to be comparable. 

The experiment was performed in a laboratory 
type situation set up in the foyer of a large super 
market. Fifty of the housewives coming to the 
market to shop volunteered to serve as Ss. 

The Ss were tested under blinded conditions made 
possible by seating each S parallel with the side of a 
table and close enough to place her right forearm 
and hand behind a screen which was mounted to the 
table. This arrangement made it possible to place 
the loaves, one at a time, under the Ss hand and 
prevented the S from seeing the loaves being judged. 

Each S was instructed that this was an experiment 
to determine whether or not people can tell how 
fresh bread is by feeling it; that one loaf and then 
another would be presented under the S’s hand; that 
she should feel one and then the other and tell the 
experimenter which of the two was fresher. No 
equal judgments were allowed. 


Table 1 


Judgments of “Fresher’’ Made for Each of Three 
Wrappers and Three Ages of Bread When 
Presented in a Paired-Comparison 
Design to 50 Housewives 








Age of Bread 


One- 
day 





Two- 


Wrapper Fresh day 





Cellophane (125) 


62.5% 


Cellophane with (118) 
waxed band insert 59.0% 
Wax (57) 
28.5% 


(122) 
61.0% 


(97) 
48.5% 


(81) 
40.5% 


(126) 
63.0% 
(94) 
47.0% 
(80) 
40.0% 





Table 2 


Chi-Square Values Between Observed and Expected 
Frequencies of Judgments of “Fresher”? Made 
for Each of Three Wrappers and 
Three Ages of Bread 


Variables df 


x? Values 





Between first and second halves of 
sets of loaves used 
Between sets of loaves used in the 
first and second halves of the ex- 
perimental day 
Between fresh and one-day-old 
bread 
Between fresh and two-day-old 
bread 
Between one- and two-day-old 
bread 
Between observed and expected 
frequencies of judgments for 
three wrappers for: 
Fresh bread 
One-day-old bread 
Two-day-old bread 
All three ages of bread 


* Significant at the 5% level. 
** Significant at the 1% level. 


A full paired comparison design was used and the 
pairs were randomly presented according to a sys- 
tem previously worked out for each S. The re- 
sponses of the Ss were recorded on the individual 
record sheets for subsequent analysis. 


Results 


The numbers and percentages of judgments 
of “fresher” made for each of the three wrap- 
pers when presented in a full paired compari- 
son design to the 50 housewives under blinded 
conditions are presented in Table 1. The per- 
centages are determined on the basis of the 
number of judgments in favor of a particular 
wrapper over the total number of times that 
wrapper appeared in the judgment pairs. 
Since a full paired-comparison design was 
employed for the three wrappers, the percent- 
ages add up to 150%. 

A full analysis of the data was made by the 
chi-square technique. The results of these 
analyses are to be found in Table 2. No sig- 
nificant differences were found between the 
frequencies of judgments as “fresher” for the 
first and second halves of Ss tested on a set 
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of loaves in the various wrappers. Fearing 
that some differences might be found between 
the sets of loaves used in the first half and 
those used in the second half of the experi- 
mental day, a x” test was made on these data. 
No significant differences were found. With 
8 degrees of freedom and an alpha equal to 
OS, a single y* value of 15.5 or more would 
be necessary to indicate a difference that was 
not due to chance variations in 95 out of 100 
cases. It will be noted from Table 2 that the 
x’ values obtained are clearly below this value 
for significant differences. For an interpreta- 
tion of the remainder of the tests reported in 
Table 2, at the 95% confidence level a x’ of 
5.99 or greater is required, and at the 99% 
confidence level a x* of 9.21 or greater is re- 
quired. These values are for two degrees of 
freedom. 


Discussion 


The purpose of this study was to ascertain 
the answers to two questions. The first ques- 
tion was: Will the same differential influence 
of wrappers on the perception of freshness in 
bread be found among the primary consumer 
group—housewives—as with university stu- 


dents? The plain cellophane and the plain 
waxed wrappers were identical in the two 
studies. The percentages of judgments of 
“fresher” for the cellophane wrappers on fresh 
bread were 68.0% and 62.5% respectively for 
students and housewives. The judgments for 
the plain wax wrappers on fresh bread by stu- 
dents was 42% and by housewives it was 
28.5%. These percentages are not strictly 
comparable because the percentage for a given 
wrapper depends upon the other wrappers in 
the group. All the wrappers were not the 
same in the two studies. However, it will 
be noted that the order or ranking is the 
same for cellophane and wax in both studies. 
Therefore, for cellophane and wax, it can be 
concluded that the primary consumer group— 
housewives—responded in the same way as 
did university students. It may also be con- 
cluded with a very high degree of confidence 
that fresh bread (of equal freshness) feels 
fresher when it is wrapped in cellophane than 
when it is wrapped in a wax wrapper. 

The second question was: Will the same 
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differential influence of the wrappers on the 
perception of freshness in bread hold for one- 
and two-day-old bread as it does for fresh 
bread? An examination of the percentages 
of “fresher” in Table 1 shows some change in 
the magnitude of the judgments of “fresher” 
for the wax wrapper and for the cellophane 
wrapper with the wax band insert with one- 
and two-day-old bread. Although the orders 
remain the same, there is a significant differ- 
ence between the percentages on fresh bread 
and those for one-day-old bread. Likewise 
there is a significant difference between fresh 
bread judgments and two-day-old bread judg- 
ments for the various wrappers. The chi- 
square values are given in Table 2 and were 
found to be significant at the 95% level of 
confidence. No significant differences were 
found between the one- and two-day-old bread 
in percentages of judgments. All three ages 
of bread showed significant deviations from 
chance expectancies for the three wrappers. 
It may be concluded, therefore, that, like fresh 
bread, one- and two-day-old bread also feels 
fresher when wrapped in a plain cellophane 
wrapper than when wrapped in wax or cello- 
phane with a wax band insert. 


Summary and Conclusions 


The purpose of this experiment was to an- 
swer two questions with reference to the in- 
fluence of tactual sensations supplied by the 
wrapper on the perception of freshness in 
bread. Previous research by the author had 
revealed that for fresh bread, loaves of equal 
freshness were perceived by university stu- 
dents to be fresher when wrapped in cello- 
phane than when wrapped in wax. In this 
study the following questions were asked: 
(a) Will the same differential influence of 
wrappers on the perception of freshness in 
bread be found among the primary consumer 
groups—housewives—as with university stu- 
dents? (6) Will the same differential influ- 
ence of the wrappers on the perception of 
freshness in bread hold for one- and two-day- 
old bread as it does for fresh bread. 

In order to answer these questions, three 
conventional type wrappers and three ages of 
bread in a paired-comparison design were pre- 
sented to 50 housewives under blinded con- 
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ditions. The results warrant the following 
conclusions: 

1. Housewives, the primary consumer group, 
responded to the test situation in the same 
way that university students responded. They 
perceived fresh bread of equal freshness to 
be fresher when wrapped in coliophane than 
when wrapped in wax. 

2. The same differential influence of the 
wrappers on the perception of freshness in 
bread applies to one- and two-day-old bread 
as it does to fresh bread. The magnitude of 
the judgments was not as great for one- and 


Robert L. Brown 


two-day-old bread, but the order remained 
the same and judgments still differed signifi- 
cantly from expected frequencies. 
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While numerous investigations have been 
made of the stability of Strong Vocational In- 
terest Blank (SVIB) scores, relatively little 
has been done in studying factors that might 
be predictors of interest permanence. If it is 
possible to determine which individuals could 
be expected to have stable vocational interests 
and which individuals could be expected to 
have unstable interests, the validity of using 
the SVIB in educational-vocational counsel- 
ing would be enhanced and time and money 
spent in retesting individuals with stable in- 
terests would be saved. The purpose of the 
study was to determine which of 12 types of 
information available about a college fresh- 
man are useful in predicting SVIB profile 
permanence. The SVIB profile stability meas- 
ure was one developed by the investigator and 
called an S score. A description of the de- 
velopment of the S score as well as normative 
data and a study of relationships between S 
and other profile stability measures has been 
previously published (4). 


Method 


Subjects. The 242 subjects of this study were all 
the male high school graduates who entered the Gen- 
eral College of the University of Minnesota in the 
fall quarter, 1954, as freshmen, took the SVIB dur- 
ing the orientation-registration program prior to the 
start of classes, and completed their third quarter in 
the spring quarter, 1955. The subjects ranged in age 
from 16.5 to 27.5 years at the time of the fall ad- 
ministration of the SVIB with a median age of 19.0, 
a mean age of 20.0, and a standard deviation of 2.6. 
The range in high school percentile ranks (HSR) 
was from 1 to 77 with a mean rank of 27.0 and a 
standard deviation of 16.4. The SVIB Interest-Ma- 
turity (I-M) mean score was 48.8 with a standard 
deviation of 7.6. The General Aptitude Test Bat- 
tery (GATB) G mean score was 107.0 with a stand- 
ard deviation of 11.3. The American Council on 
Education Psychological Examination (ACE) mean 

1 This paper is based upon a portion of a Ph.D. 
thesis submitted to the graduate faculty of the Uni- 
versity of Minnesota. The author wishes to ac- 
knowledge the guidance of his advisors, Willis E. 
Dugan and Cyril J. Hoyt. 
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score (1952 form) was 77.2 with a standard devia- 
tion of 18.0. Ejighty-eight subjects (364%) were 
veterans and 31 (12.8%) were married. 

Procedures. The subjects were retested on the 
SVIB during the latter part of the spring quarter of 
1955. The interval between SVIB administrations 
averaged nine months. All SVIB’s were scored on 
44 occupational scales and the I-M scale. 

The two general types of statistical analysis used 
in studying the relationship between each of the 12 
factors and the S scores were product-moment cor- 
relations and analysis of variance. Correlational 
analysis was used for the following eight factors: 
(a) age, (b) I-M scores, (c) GATB G scores, (d) 
number of Primary (P) patterns, (¢) number of Re- 
ject (R) patterns, (f) ACE scores, (g) HSR, and 
(hk) Depth index (developed by Hoyt, Levy, and 
Smith [3]). 

The analysis of variance technique was used for 
the following four factors: (a) socioeconomic status, 
(b) congruence of stated occupational goal with 
measured interest pattern, (c) veteran status, and 
(d) marital status. The hypothesis of homogeneity 
of variance was tested for each factor before the 
analysis of variance was made. 

The socioeconomic status of each student was de- 
termined by utilizing Edward’s Social-Economic Scale 
(2, pp. 176-180). This scale classifies a large num- 
ber of occupations into six categories on the basis 
of the social prestige and economic level of each oc- 
cupation. Each student’s parental occupation was 
classified into one of the six categories and analysis 
of variance was made for the six categories on the 
mean S scores. 

The relationship between a student’s stated occu- 
pational goal, measured interest pattern, and inter- 
est stability was investigated in the following man- 
ner: (a) Each student’s stated occupational goal was 
classified into one of the 11 SVIB groups. (6) After 
the occupational choices of the students were classi- 
fied into a specific SVIB group, such as Group I, the 
students in each group were classified into four cate- 
gories according to whether or not they had a Pri- 
mary, Secondary, Tertiary or Reject pattern in that 
group. (c) The S scores for the four categories were 
tested for equality by the analysis of variance. 


Results 


Correlational analysis. The correlation co- 
efficients for only two factors, number of P 
patterns (ry = — .17) and Depth index (r = 
.24), were significant at the .01 level. Two 
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other factors, number of R patterns (r = — 
.12) and HSR (r= —.11) had coefficients 
which approached, but did not reach, the .05 
significance level. The three factors which 
appeared to be most promising for predictive 
purposes, Depth index, number of P patterns, 
and number of R patterns, were included in 
a multiple regression equation. The multiple 
correlation coefficient was .29 which is signifi- 
cant at the .01 level of confidence. Because 
a multiple R of .29 has little predictive ca- 
pacity, it is concluded that the eight factors 
used in correlational analysis do not indi- 
vidually or when optimally combined enable 
one to predict with a reasonable degree of 
confidence the interest stability for an indi- 
vidual case. 

Analysis of variance. The analysis of vari- 
ance for the socioeconomic status, veteran 
status, and marital status factors indicated 
in each case acceptance of the hypothesis of 
equality of mean scores. In analyzing the re- 
lationship between congruence of stated occu- 
pational goal with measured interest pattern 
and S scores, the analysis of variance was 
made for SVIB Groups I, II, IV, V, VIII, IX, 
and X. The analysis could not be run for 
Groups III, VI, VII, and XI because there 


were five or less cases for each of these groups. 
The hypothesis of equality of mean S scores 
was accepted for all groups analyzed except 


Group VIII. Aa example of the application 
of the results is as follows: If a student states 
he wants to be an accountant or selects any 
other business occupation in or closely related 
to Group VIII and has a P pattern for that 
group, he will likely have greater measured 
interest permanence than a student who 
claims accounting as his goal but does not 
have a P pattern for Group VIII. Since only 
44 students (18.8%) had a stated choice in 
Group VIII, this factor is of very limited pre- 
dictive value when the entire sample is con- 
sidered. | 
’ Discussion 

The negative results for the I-M scale as a 
predictor are in agreement with the results 
reported by Stordahl (7) and Powers (5) but 
contradict the results reported by Strong (8, 
p. 281). The results for ACE and HSR 


t 


Leslie A. King 


are similar to those found by Stordahl (6). 
While Strong (9, p. 91) has found evidence 
to substantiate the expectation that interests 
become more stable with increasing age, the 
results for thé investigator’s sample do not 
support the assumption. A nonsignificant re- 
lationship between intelligence and interest 
stability is in contrast with Cisney’s (1) re- 
port for high school students. 

The most promising of the 12 factors 
studied is the Depth index. Use of the index 
as a predictor is based on the theory that 
consistency and integration of interests are 
necessary conditions for stability and that 
these conditions can be measured by “ex- 
pected” or “unexpected” patterns. Hoyt, 
Levy, and Smith (3) found a correlation of 
— .33 between interest stability as measured 
by rank correlation (rho) and the Depth in- 
dex for a group of high school seniors retested 
two years later when they were college sopho- 
mores, and a correlation of — .37 for a sam- 
ple of the same group retested four years later 
when they were college seniors. (The reason 
for the difference between Hoyt, Levy, and 
Smith’s and the investigator's correlation co- 
efficient signs is that a low rho indicates a 
relatively unstable profile.) The problem of 
predicting vocational interest profile perma- 
nence remains unsolved. The investigator 
suggests the following problems for further 
research: (a) the relationship of personality 
characteristics to interest stability; (6) meas- 
urement of individuals’ understanding of SVIB 
item content; (c) intensive case studies, in- 
cluding interviews, of a sample of cases who 
exhibit little interest change (such factors as 
perception of interest changes, occupational 
information, work and school experience, 
hobbies, health, family pressures, and voca- 
tional interests of friends could be investi- 
gated); (d) longitudinal studies of the origin 
and development of interests. 


Summary 


The value of 12 factors in predicting SVIB 
profile stability of college freshmen was in- 
vestigated. No significant relationship was 
found between stability and age, I-M scores, 
GATB G scores, number of R patterns, ACE 
scores, HSR, socioeconomic status, veteran 
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status, and marital status. A significant rela- 
tionship existed between stability and two 
factors: number of P patterns and Depth in- 
dex. Congruence of stated occupational goal 
in business with a P pattern in SVIB group 
VIII was also significantly related to stability. 
However, the predictive value of the signifi- 
cant factors when taken individually or when 
optimally combined was of such a low order 
that they cannot be used with a reasonable 
degree of confidence in predicting interest sta- 
bility. Suggestions for further research were 
also made. 
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It was believed that the time spent in an- 
swering personality inventories might have 
significance in the measurement of person- 
ality. Since inventories which ask a subject 
to answer questions about himself admittedly 
produce answers which are influenced by the 
motivation of the subject, a time measure- 
ment would be less liable to bias than the 
trait scores ordinarily derived. It was hy- 
pothesized that individuals who are insecure, 
indecisive, or poorly adjusted at the time of 
applying for a sales job would consume more 
time, relatively, on tests composed of volun- 
tary commitment items which are likely to be 
tension producing than on tests composed of 
problem-solving items. It was also hypothe- 
sized that these individuals would be less suc- 
cessful as salesmen than others. 

Since reaction time (time spent in reacting 
to complex choice questions) and reading 
speed are not the variables with which we are 
concerned, it would be desirable to control 
these variables. In the present study there is 
no independent measure of them but they 
were to some degree controlled. 

Opportunity to test this hypothesis by cor- 
relating time expenditures against an accept- 
able independent criterion came in 1957. A 
leading national electronics sales firm whose 
entire sales organization of 226 men had been 
tested supplied success rankings by regional 
and district sales managers. These data were 
based on actual sales performance corrected 
by the raters for experience, fertility of re- 
gion, and other aspects of sales opportunity 
known to the managers. The group included 
the present force of 171 sales engineers and 
55 men who had been terminated because of 
failure to sell. The entire group was ranked, 
giving the 55 separated men all a rank of zero. 

Each of these men had completed a battery 


of tests requiring about 7 hours and includ- 
ing the Strong Vocational Interest Blank, the 
Bernreuter Personality Inventory, an inven- 
tory of neurotic tendencies, a selling aptitude 
test (actually an inventory containing per- 
sonality, interest, and values questions), the 
Bennett Mechanical Comprehension Test, a 
vocabulary test and The Personnel Labora- 
tory Power Intelligence Test (no time limit). 
These tests were presented assembled in book- 
lets so that the subject went directly from one 
to another of the tests. He was instructed to 
record the time at the beginning and the end 
of each questionnaire. 

Two types of statistics were used in this 
study: (a) The total time spent on the tests 
(referred to here as absolute time), and (4) 
a ratio between times spent on intellective 
tests and times spent on inventories. This 
ratio is believed to produce some degree of 
correction for differences in reading speed and 
reaction time since both inventories and intel- 
lective tests require considerable reading. 


Absolute Time Consumption 


When absolute time consumed on each of 
these tests is correlated with sales success of 
men in the field, the Flanagan-method Pear- 
sonian Coefficients (1) shown in Table 1 are 
obtained. 

As anticipated, these absolute time correla- 
tions are of negligible magnitude and suggest 
that if time consumption data have any pre- 
dictive significance it has been covered up by 
uncontrolled factors. 


Relative Time Consumed 


Accordingly, raw absolute time figures were 
transmuted into ratios, using the time con- 
sumed by each examinee on personality and 
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Table 1 


Correlation Between Sales Success and Time Consumed 
on Each of Seven Power Tests 


(N = 221 electronics salesmen) 


r* 





Strong Vocational Interest Blank 
Bernreuter Personality Inventory 
Sales Personality Inventory —.07 
General Adjustment Inventory — .02 
Bennett Mechanical Comprehension Test 03 
Power Test of Intelligence .00 
Vocabulary Test 02 





—.14 
— 05 


* Values in boldface are statistically significant at the 95% 
probability confidence level. 


interest inventories as a numerator, and time 
spent on “intellective” tests as a denominator. 
Various combinations of tests implementing 
this idea were investigated, as shown in 
Table 2, some having “intellective” tests as 
numerator. 

Inspection of Table 2 reveals statistical sig- 
nificance for 19 of the 28 coefficients, and a 
number of the figures are large enough to 
give substantial support to the hypothesis. 
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When we investigate various combinations 
of time scores, we find support for the hy- 
pothesis. The ratio 


Vocabulary Test Time + 
Power Intelligence Test Time 





Selling Aptitude Test Time + 
Vocational Interest Blank Time 

correlated with a criterion of sales success, 
r= 42. The largest single coefficient in the 
matrix (— .49) is produced by a ratio which 
has time for sales interest and sales person- 
ality as its numerator and total time for all 
tests as its denominator. 

It should be noted that time for all seven 
tests represents approximately one standard 
day of testing. Since the reliability of test 
response speed behavior probably increases as 
a function of the Spearman-Brown prophecy 
formula, it is not surprising that increased 
reliability of the (whole day’s speed) de- 
nominator should make for a more reliable 
ratio which can, in turn, correlate higher with 
an independent criterion than can a less re- 
liable (fractional day’s speed) ratio. 


Table 2 


Correlations Between Sales Success and Relative Time (Expressed in Ratios) for Certain Tests 
and Combinations of Tests 


(N = 221 electronics salesmen) 











A—Vocabulary Test 

B—Power Test of Intelligence 
C—Bennett Mechanical Comprehension 
D—General Adjustment Inventory 
E—Selling Aptitude Inventory 

F —Bernreuter Personality Inventory 
G—Strong Vocational Interest Blank 
T—Total of A+B+C+D+E+F+G 


A+B_ 


G Al 


D+E_ 
G 


A+B+C _ 
D+E+F+G 


ll 


36 





Ratio Numerator 


F G 


29 

.09 18 
‘ 26 04 21 
17 —.15 — 48 —.15 


A+B _ 
T 


—.42 
19 


E+G 
a = — 49 
A+B 


+o" * 








Note.—Coefficients in boldface are statistically significant at the 95th probability confidence level. The fact that the largest 


correlations were 
stability of this statistic. 


uced by ratios in which the denominator was the total time on all tests is believed to be due to the greater 
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The best prediction of sales success from a 
single test time ratio was obtained by divid- 
ing the raw time score on the selling aptitude 
test by the total raw time score for all seven 
_ tests. In keeping with the original hypothe- 
sis, this finding suggests that the individual 
who is indecisive, poorly adjusted or insecure 
with regard to the sales field and who is ap- 
plying for a sales job becomes more emotional 
in responding to these questions and hence 
spends a disproportionate amount of time on 
them. 


Summary 


It was hypothesized that the individual 
with many misgivings and anxieties about his 
suitability and/or prospects in the sales field 
would spend relatively more time answering 
inventory questions concerned with sales per- 


A. R. Yeslin, L. N. Vernon, and W. A. Kerr 


sonality and sales interest than on problem- 
solving questions not directly related to sales. 
It was assumed that such misgivings and 
anxieties would mitigate against his success 
as a salesman. Two hundred and twenty-six 
electronics sales engineers were ranked ac- 
cording to sales success. These rankings were 
correlated with ratios of time spent on inven- 
tories divided by time spent on all tests. A 
number of significan. correlations were pro- 
duced, the highest resulting from inventories 
measuring sales personality or sales interest 
as related to total time on all tests. 
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The present study is concerned with the re- 
lation between the 15 variables which the 
EPPS purports to measure and a series of 
self-ratings on these same variables. 


Method 


The Ss of the study were 96 graduate students 
drawn at random from approximately 160 students 
who were attending a course in education. Thirty- 
three of the Ss were males; 63 were females. The 
median age of the Ss was 29; the age range extended 
from 19 to 54 years. 

The following three instruments were administered 
to the Ss at the beginning and at the end of the 
course which they were attending: (a) the EPPS; 
(b) a self-rating scale of 15 items based on the vari- 
ables which the EPPS purports to measure; and (c) 
an ideal self-rating scale based on the same 15 vari- 
ables. The rating categories used for these rating 
scales were derived from the descriptions of the 
variables measured by the EPPS as given in the 
EPPS Manual (1); they are reproduced in Table 1. 
On the self-rating scale the Ss were asked to rate 


Table 1 
The Relationship of EPPS Variables to 
Self-Rating Categories 


EPPS Variables 





Self-Rating Categories 





Achievement 
Deference 
Exhibitionism 
Order 
Autonomy 


Ability to get things done 

Interest in the opinions of others 

Standing out in the group 

Being neat and orderly 

Being independent of the opinions of 
others 

Loyalty to others 

Interest in the feelings of others 

Dependence on others 

Dominance in social relations 

Being timid and feeling inferior 

Helping others in trouble 

Interest in having novel experiences 

Completing tasks that are undertaken 

Interest in the opposite sex 

Being aggressive toward others 


Affiliation 
Introception 
Succorance 
Dominance 
Abasement 


Heterosexuality 


Aggression 





1 The author wishes to thank Barry L. Levin for 
making available some of the data analyzed in this 
paper. 


themselves as they actually are; on the ideal self- 
rating scale they were asked to rate themselves as 
they wished they were. The Ss were encouraged to 
answer the questionnaires as honestly as possible. 
They were assured that all test responses would be 
held in strict confidence and would be used for sci- 
entific purposes only. 


Results 


Table 2 presents the reliability coefficients 
obtained from the present data as well as 
those given in the EPPS Manual (1). The 
test-retest reliability coefficients of the self- 
ratings and of the ideal self-ratings are also 
supplied. The reliability coefficients given by 
Edwards for the EPPS are somewhat higher 
than those found in the present study. This 
discrepancy may be due to the difference in 
the interval between test and retest for the 
two sets of data. Edwards reports an inter- 
val of one week between test and retest; the 
present study was based on a three-week in- 
terval. It should be noted, however, that 


Table 2 


Test-Retest Reliabilities of EPPS Scores, Self-Ratings 
and Ideal Self-Ratings 


EPPS 
Edwards Mann 
Data Data 


Ideal 
Self- 
Rating 


Self 


Variable Rating 


Achievement ; 64 33 49 
Deference : 87 35 
Order , 77 55 
Exhibitionism ; 71 66 
Autonomy . 76 

Affiliation d 55 
Introception ; 67 
Succorance j 72 
Dominance d 73 
Abasement ; .69 
Nurturance ‘ 59 

Change d 86 
Endurance . 77 
Heterosexuality  . 85 


Aggression ‘ .80 
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Table 3 


Correlation Coefficients Between EPPS Variables and 
Self- and Ideal Self-Ratings 








EPPS 
and 
Self-Rating 


EPPS 
and Ideal 


Variable Self-Rating 





Achievement 12 19 
Deference .28* 04 
Order Bag 02 
Exhibitionism 06 02 
Autonomy 13 18 
Affiliation .00 .00 
Introception 07 .02 
Succorance .22* 03 
Dominance .26* 04 
Abasement .39* 10 
Nurturance .34* 
Change 42* 
Endurance Al* 
Heterosexuality .40* 
Aggression .24* 


* Significant at the .05 level. 





Klett (2) found in an independent study that 
the split-half reliability coefficients of the 
EPPS were also somewhat lower than the 
corresponding coefficients reported by Ed- 
wards in the EPPS Manual (1). Even these 
lowered coefficients, however, are reasonably 
high for test reliability of a personality test. 

Table 3 presents the correlations between 
EPPS scores and self- and ideal self-ratings 
obtained from the set of data collected at the 
beginning of the Education course. As indi- 
cated in the table, 10 of the 15 coefficients be- 
tween EPPS variables and self-ratings were 
found to be significant; 14 were positive in 
direction, one was found to be .00. To find 
this many positive relationships is highly sig- 
nificant since it would occur by chance less 
than once in a thousand times. 

Table 3 also indicates that only one of the 
15 correlations between EPPS variables and 
ideal self-ratings was found to be significant. 





Since one significant correlation coefficient in 
20 would be expected to occur by chance, the 
relationship between EPPS scores and ideal 
self-ratings appears to be negligible. This 
finding is not surprising since one component 
which enters into ideal self-ratings is social 
desirability, and the EPPS is designed to 
eliminate the component of social desirability 
as a factor in test response. 


Discussion and Summary 


The findings of the present study support 
the conclusions that; (a) the EPPS has satis- 
factory test-retest reliability; (5) the EPPS 
correlates with self-ratings on the variables 
which it purports to measure; (c) the EPPS 
does not correlate with ideal self-ratings on 
the variables which it purports to measure. 

In order to interpret these findings it is im- 
portant to note that the categories on which 
the Ss rated themselves were arbitrarily se- 
lected and formulated from the description of 
the variables supplied by Edwards in the 
EPPS Manual. This formulation was neces- 
sarily a crude procedure since Edwards sup- 
plies a whole paragraph of description for 
each variable, and this paragraph could not 
easily be summarized in a few words to form 
the rating category. With this difficulty in 
mind, and considering the range of personal 
interpretation that is likely to occur among 
different individuals rating themselves on such 
highly abbreviated category headings, it is 
indeed surprising that the present results 
were as decisive as they proved to be. 


Received October 25, 1957. 


References 


1. Edwards, A. L. Edwards Personal Preference 
Schedule manual. New York: Psychological 
Corp., 1954. 

2. Klett, C. J. Performance of high school students 
on the Edwards Personal Preference Schedule. 
J. consult. Psychol., 1957, 21, 68-72. 





Journal of Applied Psychology 
Vol. 42, No. 4, 1958 


A Manifest Structure Analysis of the Otis S-A Test of 
Mental Ability, Higher Examination: Form B 


Frank M. du Mas 


Montana State University 


and King MacBride 


Michigan State University 


The purpose of this paper is to show how 
a new method called manifest structure analy- 
sis can be used to shorten a standardized test. 

Tests constructed by conventional methods 
are often considered to be well-developed 
measuring instruments. They are usually 
composed of many items with a fairly long 
time limit. There arises in practical situa- 
tions the need for tests which are sufficiently 
accurate for certain requirements and at the 
same time not too demanding of a testee’s 
time. These practical considerations con- 
stantly arise in the selection and evaluation 
of personnel in business and industry. One 
form of the Otis S-A Test of Mental Ability 
has been shortened by Wonderlic (3) in or- 
der to meet the practical needs of business 
and industrial organizations. Wonderlic built 
a 50-item test out of the longer Otis Test, 
which required 12 instead of 30 minutes of 
the testee’s time. His approach was to apply 
classical test methods to further reduce a test 
already constructed by means of classical test 
methods. This shortened form of the fully 
developed test is widely used in business and 
industry. Such shortened tests satisfy a real 
need in many practical situations. 

The new method, manifest structure analy- 
sis (1), was not developed specifically for the 
purpose of shortening a test. It is a quite 
general scale or test theory and method. The 
shortening of a test is simply one of the many 
useful applications of the method. Since 
manifest structure analysis is essentially a 
scale theoretic approach to item analysis and 
other measurement problems, and since clas- 
sical test construction methods are not well- 
formulated scaling methods, it was thought 
desirable to apply the more severe criteria of 
a scaling procedure to a reputable psycho- 
logical test. This was done in order to see 


whether or not a scale could be extracted 
from a set of items already analyzed by clas- 
sical test methods. 


Procedure 
Criterion 


The criterion selected for prediction was 
the total score made on the Otis S-A Test of 
Mental Ability, Higher Examination: Form B 
(2). The total score was derived under the 
condition of the 30-minute limit rather than 
the 20-minute time limit. 


Method 


The method used for the analysis of responses to 
items on the full scale was a new method called 
manifest structure analysis (1). The complete the- 
ory and method, with step-by-step computational 
examples, are given in the reference. The general 
model of manifest structure analysis is shown in 
Fig. 1. The particular scale model that we at- 
tempted to fit empirically was the intensive model 
of manifest structure analysis as shown, for the spe- 
cial case of a 10-item test, in Fig. 2. The intensive 


Response Patterns 














Fic. 1. This is the general model of manifest 
structure analysis. Note that columns represent re- 
sponse patterns, not items. A number in the matrix 
is the probability that a response pattern will occur 
at a specified level, rank or score. 
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Fic. 2. This is a special case of the intensive 
model with 10 possible values of a criterion and 10 
items. In this study, rows represent total scores on 
the long form of the Otis and columns represent 
scalable items. An x in this matrix represents the 
probability value, 1, that an individual who makes 
a specified score will answer certain items correctly, 
cells without an x represent a probability of zero. 


model is roughly equivalent to a cumulative scale 
and is simply a special case of the general model 
shown in Fig. 1. 

Manifest structure analysis permits us to derive 
a score for an individual from (a) his response pat- 
tern to all items or (b) from his responses to single 
items. Scores derived in the first way are obtained 
by finding the particular pattern of item responses 
made by a testee and then assigning him the scale 
value or weight calculated for that response pattern. 
The second method derives a score for an individual 
from the item weights for all items that he answered 
correctly. The second method of deriving a score 
was used in this study. The scale value for each 
item is calculated from the formula 


[1] 
where 


V = a category scale value, 

ZT = the sum of the total scores made on the long 
form of the Otis S-A Test by every individ- 
ual answering that item correctly, 

N = the number of individuals answering the item 
correctly. 


On the basis of the responses made by the standardiza- 
tion sample, a group of items was selected which was 
an empirical analogue of the intensive model shown in 


Fig. 2. A score on the short form was calculated by 
means of 


S==, 
n 


[2] 
where 


=V = the sum of the category scale values of those 
items that an individual answers correctly, 
n = the number of items he answered correctly, 
S = the individual’s score on the shortened test. 


In order to predict a score on the longer test from a 
score on the shorter test, a correction must be made, 
This correction is 


S' = mS +k, [3] 


where 


S’ = the weighted score predicted from the score 
made on the shortened test, 


m = the slope constant, 
S = the score on the shortened test, 
k = the ordinate intercept. 


In this way the prediction of the most probable 
score an individual would make on the longer test is 
possible from the score made on the shorter test. 

Sample 1. This was the standardization sample 
composed of 70 college students. Their responses to 
the 75 items of the Otis yielded for analysis 5,250 
possible single item responses and 275 different pos- 
sible response patterns. In any time limit test. com- 
posed of items arranged in order of increasing diffi- 
culty, the operating characteristic of an item reflects 
both item difficulty and item inaccessability because 
some of the slower Ss never get to try some of the 
more difficult items. Both the short and the long 
forms of the Otis contain and confound these two 
sources of variance since they were derived under 
the condition of a time limit. 

The intensive model was used as our search model 
for any possible ordered structure existing among the 
response patterns. The total time required in ex- 
tracting the 20 scalable items was 18 minutes. Scale 
values were calculated for each item. Then scores, 
S and S’, were calculated for each individual. The 
testing time required for the short form was approxi- 
mately 20/75 of 30 minutes, that is, eight minutes. 

Sample 2. This was the sample used in the cross- 
validation of the short form. This sample was com- 
posed of 39 additional students who took both the 
long form and the short form. The scale values for 
items had already been derived from Sample 1. 
These scale values were used to weight the item re- 
sponses to the short form made by individuals in 
Sample 2. The scores, S and S’, were then calculated. 


Results 


Standardization (Sample 1). Twenty items 
were selected from the 75 items in the long 
form. These items, which formed an em- 





Otis S-A Test of Mental Ability 


Table 1 


The Twenty-Item Test Obtained by Means of a 
Manifest Structure Analysis of the Otis 








Items Scale Values 





1 54.43 
47 55.52 
55 55.75 

55.80 
55.80 
56.00 
56.20 
56.47 
56.59 
56.80 
57.02 
57.56 
57.64 
58.31 
58.32 
58.46 
58.63 
73 60.00 
74 60.08 
71 60.26 





Note.—The first column contains the numbers which desig- 
nate items in the long form of the Otis. The second column 
contains the scale value for each item in the first column. 


pirical analogue of the intensive model, are 
listed in the first column of Table 1. The 
scale value of each item is shown in the sec- 
ond column of Table 1. Each individual's 
score, S, was calculated by means of Formula 
[2]. Scores on the long form and scores on 
the short form were plotted and a linear func- 
tion fitted to the data by the method of least 
squares. The parameters of Equation [3] 
were estimated to be 


S’ = 10.06S — 513.31 [4] 
The predicted score, S’, was then calculated 
for each individual by means of Formula [4]. 
The correlation between total score on the 
long form and the predicted score on the 
short form was rrg = .72. The ¢ test of the 
null hypothesis resulted in p < .01. 
Cross-validation (Sample 2). The scores 
(T on the long form and S on the short 
form) were obtained for each individual in 
this sample also. Formula [4] was used to 
calculate the predicted scores, S’. The cor- 
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relation between total score on the long form 
and the predicted score on the short form 
was frg = .82. The ¢ test of the null hy- 
pothesis resulted in p < .O1. 


Discussion 


Some clarification might be desirable re- 
garding certain operations performed in a 
manifest structure analysis of the kind that 
occurred in this study. 

The question may have arisen as to just 
why the weighted score, S’, is used when the 
unweighted score, S, is already available. In 
the operations of manifest structure analysis 
the unweighted score, S, has a much smaller 
range than the manifest or criterion variable. 
The unweighted score is, however, linearly re- 
lated to the manifest variable. By weighting 
the unweighted scores, as indicated in For- 
mula [3], the prediction of the manifest vari- 
able is attained. The weighted scores, S’, 
then have a range closely approximating that 
of the manifest variable. In this instance, 
this permits us to obtain from the short form 
of the Otis the score an individual most prob- 
ably would have made on the long form of the 


Otis. The correlation between the long form 
and the weighted scores obtained on the short 
form, rrs-, was calculated in order to obtain 
an estimate of the concurrent validity of the 


shortened test. Except for rounding errors, 
the correlation r7zg should be identical to the 
correlation between the total and the un- 
weighted scores, rrg. 

The shortening of the long form of a test 
by means of manifest structure analysis has 
certain practical advantages over the statisti- 
cal methods employed in conventional test 
analysis. The use of a mechanical device 
called the scaling frame (1) greatly reduces 
the time and cost of the item analysis. For 
example, the scalable items shown in Table 1 
were analyzed and extracted in 18 minutes. 

The intensive catescale is similar to the 
structural response model of Loevinger’s ho- 
mogeneous tests. In both of these methods 
of analysis one attempts to relate the diffi- 
culty of an item to a particular level of abil- 
ity. In this regard, they are somewhat simi- 
lar to the item ogive of classical test methods 
relating the proportion passing the item to 
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degree of ability. The intensive model is, 
however, different from both Loevinger’s ap- 
proach and the conventional one. In these 
two methods the item difficulty is computed 
from the proportion passing or failing the 
item, and the trait continuum is inferred from 
the content of the item. In the intensive 
catescale of manifest structure analysis, the 
scale value of an item, which is roughly 
equivalent to item difficulty, is calculated by 
reference to the manifest variable or cri- 
terion. That is, the scale value and the trait 
dimension are both operationally defined in 
terms of the relationship between an item 
and a manifest criterion. 

Other things being equal, any method that 
derives actual scale value for items should be 
superior to the conventional method of scor- 
ing—as used in the long form of the Otis— 
which simply sums the number of items got 
right. This is due to the fact that when one 
gives one point for each item got right the 
implicit and usually fallacious assumption is 
made that the trait distances between con- 
secutive terms are all equal. When a score is 
calculated on the basis of scale values, as in 
manifest structure analysis, the trait distance 
between consecutive items is specified with 
greater precision. It is obvious that from the 
long form of the Otis, which contains 75 
items, one should be able to select far fewer 
items spaced over the entire range of difficulty 
in such a way as to accomplish measurement 
of a trait. The correlation, rrg’, is a form of 
concurrent validity. The two coefficients ob- 
tained for the standardized and cross-valida- 
tion samples were.a .72 and .82 respectively. 

In the present study, Item 1 was found to 
be the least difficult and Item 71 the most 
difficult of all items selected from the long 
form of the Otis. The difference between 
these items’ scale values is 5.83 in units de- 
rived from the total score on the long form 
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of the Otis. This apparent restriction in 
range is not, however, as important as it may 
at first seem to be. All calculations were car- 
ried to five and rounded to four significant 
digits. By multiplying the value 5.83 by a 
constant, say 100, one gets 583. There is 
actually a range of 583 countable units avail- 
able for prediction by means of the 20-item 
short form of the Otis. Equation [4] is essen- 
tially that linear transformation which trans- 
forms the range of values obtained on the 
short form into the range of values obtained 
on the long form. 


Conclusions 


1. A new method, Manifest Structure Analy- 
sis, was found applicable to the problem of 
shortening a test previously well-analyzed and 
constructed by means of classical test meth- 
ods. 

2. A short test was developed composed of 
20 items of the 75-item Otis S-A Test of Men- 
tal Ability, Higher Examination: Form B. 
The testing time for the long form was 30 
minutes. The testing time for the short form 
was eight minutes. 

3. The correlation between scores on the 
long and short forms was .72 for the stand- 
ardization sample. Cross-validation resulted 
in a correlation of .82 between the scores ob- 
tained on the long and short forms. 
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Effects of Surface Friction on ‘Skilled Performance with 
Bare and Gloved Hands‘ 


Hilde Groth and John Lyman 
University of California, Los Angeles 


It is common knowledge that circumstances 
of environment such as temperature or harm- 
ful chemicals may require that protective 
handcoverings be worn while skilled manual 
tasks are performed. Although the available 
literature in this field is not large, there seems 
to be general agreement that all gloves, even 
the thinnest surgeon’s gloves, lead to some 
measurable performance decrement (1, 2, 3, 
8, 9). In previous studies, this decrement 
has been defined as losses in performance 
speed and quality. One of the important 
factors contributing to dexterity decrement 
has been assumed to be interference of the 
handcovering material with normal transmis- 
sion of cutaneous cues (6, 8). 

Pilot studies in this laboratory suggested 
that changes in surface friction might be of 
considerable importance as a physical factor 
in observed skill decrements, perhaps inde- 


pendently of effects on cutaneous sensitivity. 
In a different context we found that a meas- 
urement of prehension force during manipula- 
tion could be considered to be a fairly good 
index of effort. during manual performance at 


low levels of energy expenditure (4). Since 
friction and required prehension force are 
closely related on purely physical grounds it 
appeared desirable to attempt to evaluate this 
relationship in terms of various performance 
criteria. 

This investigation was designed to assess 
experimentally the effects of surface friction 
upon effort, speed of performance and rate of 
output for a simple manipulation task. It 
was hoped that simultaneous measurement of 
these dependent variables would provide us 
with information as to which aspects of a 


1 This investigation was supported by QM Con- 
tract No. DA 44-109-9M-1531 between the U. S. 
Army QM Corps and the University of California, 
Los Angeles. The opinions expressed are those of 
the authors and do not necessarily reflect those of 
the contracting agency. 


task would be most adversely affected by non- 
optimal conditions of surface friction. 

The specific hypotheses tested in this study 
were as follows: (a) Prehension force exerted 
during a task is inversely related to the size 
of the coefficient of friction between the sur- 
face of the handcovering and the manipulated 
object. (5) Speed of performance remains 
stable over some range of friction and is in- 
versely related to extremely low coefficients 
of friction. (c) Rate of output is less sensi- 
tive to changes in friction but also shows an 
inverse relationship. 


Procedure 


Subjects. Twelve right-handed male engineering 
students were recruited from undergraduate classes. 

Apparatus and tasks. Figure 1 shows the ma- 
nipulation apparatus used. The electronically-con- 
trolled manipulation apparatus has been described in 
detail elsewhere (4). Its main components were a 
simple formboard, a light bank display panel and a 
split aluminum cylinder instrumented with strain 
gages. Both display and control panels consisted of 


Fic. 1. Manipulation apparatus. 


273 





274 


a 3 X 6 matrix arrangement placed on a table 30 in. 
high. Weight of the aluminum cylinder was 122 gms. 
Task performance required only the simple motion 
elements of grasp, transport and release. Instru- 
mentation permitted recording of (a) the integral 
of prehension force applied to the cylinder during 
manipulation, (b) the sum of the transport times for 
each individual movement for the duration of the 
task, (c) the sum of the cylinder transports. Ear- 
phones were worn by all Ss in an attempt to control 
the aural environment. The output of a random 
noise generator was adjusted to 65 db SPL and used 
as a masking noise. 

The task was self-paced. The cylinder had to be 
placed into that hole of the formboard which corre- 
sponded to the lighted circle on the display matrix. 
The display would change to the next position as 
soon as the contact of the preceding move was made. 
The order of the lighting sequence appeared random 
to the Ss. 

Changes in surface friction were produced by ap- 
plication of a coat of wax-benzene paste or of a 
silicone release agent (Dow Corning 7 Compound) 
to the finger tips of the bare hand and to the finger 
tips of an army leather glove (Glove, Shell, Leather, 
M-1949). By this method we hoped to obtain simi- 
lar frictional conditions for bare hand and glove 
performance. 
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The coefficient of friction between each treatment 
condition and aluminum was determined by the drag 
method in which the mean force at the “just slip” 
point over the horizontal surface is measured by 
spring scales. The « was calculated according to the 
formula, 
mean “just slip” force 

weight of object 





= 


Routine. The independent variables were com- 
bined into six treatment conditions, each of three 
minutes duration: 


1. Bare hand, wiped with alcohol (« = 1.53) 
2. Bare hand, coated with wax-benzene paste (u 
= .68) 
. Bare hand, coated with silicone grease («4 = .14) 
. Leather glove, untreated (u = 41) 
. Leather glove coated with wax-benzene paste 
(4 = 65) 
. Leather glove coated with silicone grease (u = 
14) 


Each S was thoroughly familiarized with the tests 
by £, then given a three-minute practice trial with 
bare, untreated hands. Treatments 1 to 6 were ran- 
domly administered in a subject-by-treatment design 
during a single session. The Ss were standing for 


LEATHER GLOVE 


GLOVE AND WAX 


BARE AND WAX 


a S 1.0 i 3 


COEFFICIENT OF FRICTION (pm) 
Relationship between coefficient of friction and prehension force. 
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all experimental conditions. During the 5-min. rest 
pause betwen any two treatments, the Ss sat down 
and friction coatings were applied by E. 

The experiment was conducted during the first 
week of July 1957. Room temperature remained be- 
tween 26° and 27° C throughout the experiment. 

Calculations. The effects of surface friction on 
prehension force were assessed by analysis of vari- 
ance by ranks (10). Heterogeneity of variance in- 
dicated the use of nonparamctric statistics. Differ- 
ences between any two treatments were evaluated by 
the signed rank test (10). 

The effects .of surface friction on time per trans- 
port and number of transports were assessed by 
analysis of variance (7). Differences between any 
two treatments were evaluated by ¢ tests. 

The significance level was set at P < .01. 


Results ? 


Relationship of friction to effort. Criterion 
measure: mean prehension force (PF), ob- 
tained by dividing the integral of force by 
total transport time. 

PF shows a monotonic increase with a de- 
crease in the coefficient of friction. Figure 2 
is a graphical representation of this relation- 
ship. 

The statistical analyses led to the following 
results: 

1. For the bare hand conditions, there was 
a significant increase in PF between any two 
treatments showing a decrease in friction. 

2. The same results were obtained for the 
glove conditions. 

3. Comparison of treatments with similar 
friction, i.e., glove coated with wax with bare 
hand coated with wax, and glove with silicone 
grease with bare hand with silicone grease did 
not show any significant difference between 
the PF values. 

Table 1 summarizes the mean PF values 
and the variabilities. 

Relationship of friction to speed of per- 
formance. Criterion measure: time per trans- 
port, obtained by dividing total transport time 
by total number of transports. 

No consistent relationship between perform- 


2 The statistical tables have been deposited with 
the American Documentation Institute. Order Docu- 
ment No. 5594 from A.D.I. Auxiliary Publications 
Project, Photoduplication Service, Library of Con- 
gress, Washington 25, D. C., remitting in advance 
$1.25 for microfilm or $1.25 for photocopies. Make 
checks payable to Chief, Photoduplication Service, 
Library of Congress. 


Table 1 


Mean Prehension Forces and Variabilities 


Coefficient 
of Friction 
s (gms.) “ 


Exp. PF PF 
Condition X (gms.) 


294.2 .73 
196.2 1.53 
236.2 .68 
1159.3 14 
568.0 Al 
337.9 65 
878.1 14 


486.69 
383.23 
515.56 
2219.33 
1154.50 
575.93 
1917.03 


Bare, untreated 
Bare, alcohol 
Bare and wax 
Bare and silicone 
Leather glove 
Glove and wax 
Glove and silicone 


ance speed and the coefficient of friction was 
found. The mean values and variabilities are 
summarized in Table 2. 

Comparison of treatments with similar sur- 
face friction led to equivocal results. Per- 
formance with the bare hand coated with wax 
was faster than with the corresponding glove 
condition. However, for the silicone coatings 
the trend was reversed and performance with 
the coated glove was faster. 

Relationship of friction to rate of output. 
Criterion measure: total number of transports 
during the three-minute test trials. 

The statistical analysis indicated that only 
extremely low surface friction as obtained by 
silicone grease application depressed the out- 
put rate consistently. The mean values and 
the variability of the output rate are summa- 
rized in Table 3. 

Comparison of treatments with similar sur- 
face friction indicated a superior performance 
for the bare hand coated with wax but failed 


Table 2 


Mean Times Per Transport and Variabilities 


T/t 
x thee) 


Bare, untreated 
Bare, alcohol 7 
Bare and wax 7 
Bare and silicone 78 
Leather glove 72 
73 
75 


Exp. T/tr 
Condition s (sec.) 


8 
72 


Glove and wax 
Glove and silicone 
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Table 3 
Mean Number of Transports and Variabilities 








Exp. No. of tr. No. of tr. 
Condition Xx s 





122.2 
146.3 
150.2 
123.5 
139.9 
140.2 
130.1 


33.6 
25.0 
26.5 
36.5 
31.8 
27.3 
23.8 


Bare, untreated 
Bare, alcohol 
Bare and wax 
Bare and silicone 
Leather glove 
Glove and wax 
Glove and silicone 





to show a significant difference between the 
two silicone conditions. 


Discussion 


Relating the results of this study to our 
hypotheses, we find that only the first hy- 
pothesis postulating the relationship between 
surface friction and prehension force has been 
supported. Furthermore, changes in prehen- 
sion force were apparently unaffected by any 
effects the handcovering had on cutaneous 
sensitivity. 

If we assume PF to be a useful index of 
effort and that there is a close relation be- 
tween the amount of effort exerted on a task 
and the time of onset of fatigue, the experi- 
mental results have additional significance. 
The present results indicate that speed and 
output rate on a manipulation task may be 
kept within narrow tolerance limits despite 
adverse conditions of surface friction, though 
the “physiological cost” requirements can be 
considered to have risen considerably. We 
would like to emphasize that our task did not 
require a high degree of manual dexterity and 
this may partly account for the failure to find 
consistent changes in performance speed as 
well as the lack of performance decrement 
attributable to distortions of sensory cues. 
The importance of surface friction on a task 
of long duration and its effects upon learning 
and fatigue on the three criterion measures 
cannot be answered from this study. How- 
ever, we feel that the results of this investiga- 
tion have rather unequivocally pointed out 
the importance of surface friction as a physi- 
cal variable for some aspects of manipulatory 
performance. That this variable should re- 


ceive adequate attention in designing protec- 
tive handcoverings for optimal performance 
seems strongly indicated. 


Summary 


The major purpose of this study was to 
assess the effects of surface friction upon 
three criterion measures of manipulatory 
performance: (a) prehension force, (5) time 
per transport, (c) total number of transports. 
These measurements were considered as in- 
dices of the following aspects of performance: 
(a) effort, (b) speed, (c) output rate. 

We tried to isolate at least partially the ef- 
fects of friction from other factors of the 
handcoverings and the problem of lack of 
cutaneous sensitivity. Changes in friction 
were produced by application of either a coat 
of wax-benzene paste or of silicone grease to 
the bare finger tips and to the tips of a 
leather glove. 

It was hypothesized that a decrease in fric- 
tion would increase the amount of effort, re- 
tard the speed of performance and decrease 
the output rate. 

Twelve Ss performed a simple manipula- 
tion task which required discrete movements 
of an instrumented aluminum cylinder on a 
formboard. 

The results indicated a close relationship 
between decrease of surface friction and in- 
crease of prehension force. The effects of 
friction on time per transport remained ob- 
scure and the total number of transports de- 
creased only at extremely low values of the 
coefficient of friction. 


Received December 9, 1957. 
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The Identification of Job Activities Associated with Age 
Differences in the Engineering Industry 


S. Griew and W. A. Tucker’ 
University of Bristol, England 


A previous study has demonstrated the 
high degree of stability which exists in differ- 
ences of age distribution between jobs in the 
engineering industry, both over a period of 
time and over a range of firms and areas (7, 
8). The results of this investigation suggest 
that a substantial proportion of men as they 
reach middle age must leave those jobs which 
are usually manned by younger workers and 
that certain features of these jobs probably 
make them unsuitable for older workers. If 
ways could be suggested of modifying jobs in 
which these features are present, some of the 
wastage due to migration from “young” jobs 
might be eliminated. 

There seem to be two approaches to the 
problem of suggesting modifications. First, 
by paying attention to the results of experi- 
mental studies of performance changes asso- 
ciated with age (e.g., 2, 10), one could base 
recommendations upon known changes occur- 
ring with age, applying laboratory findings 
directly to the work situation. It is doubt- 
ful, however, whether sufficient is yet known 
about the relationship between age and total 
job behavior to justify this approach. 

The second possible approach involves the 
preliminary study of jobs theraselves, in order 
to identify those features which may be criti- 
cal in the effective perforrnance of older 
workers. 

The purpose of this paper is to report an 
attempt, involving job study, to broadly 
identify job activities which are likely to 
differ among younger and older workers and 
which may be taken to represent areas in 
which, after more detailed investigation, 
modifications are likely to prove effective. 


1W. A. Tucker is now with the Wool (and Allied) 
Textile Employers’ Council Work Study Centre, 
Bradford, England. The authors are indebted to the 
Nuffield Foundation of London, which financially 
supported this project. They are also grateful to 
K. F. H. Murrell for comment and suggestions. 


Method 


Before undertaking the job studies, two methodo- 
logical issues required attention. In the first place, 
upon what basis should jobs be selected for study? 
Secondly, what sort of information should be col- 
lected about them? After consideration of the first 
of these issues, the obvious approach of comparing 
the contents of jobs manned by younger workers 
with those manned by older workers appeared dan- 
gerous and the results likely to prove misleading. 

Since the now classical study of secretaries (4), 
the dangers inherent in the use of job titles have 
been emphasized repeatedly. It is misleading to 
treat job titles as if they embraced a complex of ac- 
tivities and requirements all of which are to be 
found to the same degree in the work of each per- 
son assuming the title. In previous industrial studies 
in the field of ageing (10), the caution required be- 
fore it could reliably be assumed that persons doing 
nominally the same job were doing exactly the same 
work was recognized. Variations of activity which 
appear, on the surface, to be unimportant may be 
vital in the study of special groups such as older 
workers. Although this point is not especially 
stressed, it is probably largely responsible for Bridges’ 
(3) and Hanman’s (6) pleas for intensive, compre- 
hensive and accurate job analyses in the case of dis- 
abled workers. 

It was decided instead to examine the actual work 
engaged in normally by a group of younger and a 
group of older workers. In this way, it was hoped 
that critical differences in work content would be 
displayed. The two groups were distinguished 
sharply by age, in order to contrast them as much 
as possible, and to reduce possibilities of overlap in 
“effective” age.2 The two groups were randomly 
drawn from two populations of workers employed 
in a firm manufacturing piston and “turboprop” 
aero engines. The two populations were roughly 
matched for such things as age of entry into engi- 
neering, domestic circumstances, type of education 
and minimum length of service (three years) in their 
jobs at the time of the study. Minimum length of 
service was introduced in order to increase the 
chances of selecting for examination men who, by 


? Had there been time to study very large samples 
in each of the jobs, the original approach would 
probably have been justified, but this would have 
been tantamount to studying the work done by men 
of different ages, except that the clear separation of 
the age groups would not have been possible. In 
any event, the final analysis of data would have been 
more complex, and interpretation more difficult. 
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virtue of their retention in their jobs, could be con- 
sidered of at least average effectiveness. The two 
populations were aged 24 to 30 years, and 48-61 
years, and included occupants of 10 specially se- 
lected jobs. These, which were selected on the ba- 
sis of their being reasonably comparable in terms of 
the basic activities they involve, were: 


Borers 

Capstan Operators 
Drillers 

Fitters 

Grinders 

Inspectors 
Instrument Makers 
Millers 

Polishers 

Turners 


: 1) 
1) 
5) 
7) 
4) 
7) 
4) 
6) 
3) 
8) 


UNAAUIAWHE 
ooooggoessse 


The figures in brackets refer to the number of 
younger (Y) and older (QO) occupants of jobs in 
the final samples which were studied. The nominal 
roll of each population contained approximately 850 
names, and random sampling was achieved by select- 
ing for study every twentieth name appearing on 
these alphabetical lists. 

The process produced a younger group of 42 work- 
ers, and an older group of 46 workers. Before start- 
ing the analysis, the “satisfactoriness” of each of 
these 88 workers was checked with the firm’s labour 
department, as an additional measure to ensure that 
reasonably effective men were, in fact, being studied, 
and the personal cooperation of each man was 
sought. Cooperation was given freely, and all mem- 
bers of both groups were passed by the firm as being 
of at least average effectiveness in their jobs. 

The second issue of what information should be 
collected about the work of each man raised many 
questions germane to the whole practice of job analy- 
sis. Many writers have stressed the dangers of rat- 
ing job content subjectively, and at least one study 
in recent years has clearly demonstrated the lack of 
consistency which may exist between job analyzers 
(9). A discussion of job analysis issues is outside 
the scope of this paper, but it may be noted that the 
purpose of this study was such that most of the con- 
ventional methods of obtaining information about 
jobs, geared as they are to the establishment of se- 
lection procedures and the like, seemed inapplicable. 
Basically, what was rquired was a system of classi- 
fying job activities which reflects 1 model of the hu- 
man operator which is both legitimate on psychologi- 
cal grounds, yet comprehensive enough to cover job 
contents which may have implications outside the 
formal scope of psychology. At the same time, this 
system should be well defined enough to allow the 
recording of job activities in an absolutely objective 
manner. 

It was decided to tackle the problem by concen- 
trating on those wide are.s of activity which ac- 
companied most of the work covered by the study, 
and which could be objectively recorded by timing, 
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with the aid of a stop watch, total work cycles, and 
the time spent in the various activities of which it 
was composed, and by noting the frequency of ac- 
tivities during the work cycle. 

For purposes of indicating the scope of the in- 
vestigation, a list of areas which were considered 
follows: 


Lifting: weight, height, distance, frequency 

Posture: standing, sitting, walking, stooping 

Bodily Activity: hands/arms, trunk, legs 

Controls: total number, number used, frequency 
used, relationship with displays 

Displays: total number, number used, frequency 

used 

Instruments Used 

Instructions Followed 

Tolerances Worked to 

Work Cycle Data: total length, and, in the case of 

machinists, proportion of cycle time spent fitting 
and removing components, setting, machining, 
checking, on “automatic” 

Working Speed 

Degree of “Visual Attention” Required 

Degree of “Perceptual-Motor Co-ordination” Re- 

quired 

Work “Finish” 

In actual practice it proved impossible to record 
the last two items objectively, owing probably to 
the inadequacy of definition. These were conse- 
quently disregarded in conducting analyses as find- 
ings about them would probably have been mis- 
leading. 

Each member of each group was observed at 
work for between one to four or five days, accord- 
ing to the length of his work cycle. Information 
about the contents of his work was recorded on a 
specially prepared form, only after the analyzer was 
certain that variations (which turned out to be small 
enough in all cases except one to be negligible) were 
noted. Information was then transferred to “master 
sheets,” and the presence and extent of each feature 
in the two groups were computed, and statistical 
comparisons between the groups undertaken. 


Results 


In presenting the results of this study, 
we will concentrate upon those job features 
in which significant differences between the 
groups were displayed. Certain features 
showed no difference; this may have been 
due to the fact that these features are not 
critical in relation to age, as well as to the 
probable lack of sensitivity of the methods 
used. The features which failed to show sig- 
nificant age differences were those concerned 
with the degree of lifting involved, bodily ac- 
tivity (which very nearly showed a signifi- 
cant difference), the instruments used, the in- 
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structions followed, the tolerances worked to, 
the speed of working, and, as previously 
stated, “perceptual-motor co-ordination” and 
work “finish.” The two points of greatest 
interest in this catalogue of negative results 
are, first, the absence of speed as a factor 
critical in the employment of older workers, 
and, secondly, the apparent unimportance of 
“physical effort” in the form of lifting. 

Previous workers, notably Belbin (1) and 
Welford (10), have emphasized the critical 
nature of certain forms of pacing and speed 
stress, and it is not surprising that this study 
did not confirm these previous findings as, in 
the factory in which this study was under- 
taken, all working speeds were essentially un- 
der the control of individual operators. 

The apparent unimportance of “physical 
effort” is also in line with Belbin’s (1) previ- 
ous findings. When heavy work is not com- 
bined with certain other factors it seems that 
it does not assume critical proportions. One 
of the most important additional factors com- 
bining with heavy work to provide difficulty 
is pacing, and this was not present. It is rea- 
sonable, therefore, that lifting should not have 
appared as critical. By the same token, other 
factors might have assumed critical propor- 
tions had pacing been present, and this should 
not be overlooked in considering the negative 
results. 

The following features showed significant 
differences between the younger and older 
groups: 

Stooping. A record was made of all those 
workers who worked for more than 50% of 
their time stooping at an angle of more than 
approximately 30° from the vertical. Of the 
42 younger workers, 18 worked under condi- 
tions of stooping satisfying this criterion, and 
of the 46 older workers, only six were found 


Table 1 
The Use of Controls 








Proportion of Used/Total Controls 
<.40 40-.60 >.60 





W. A. Tucker 


Table 2 
The Use of Displays 








Proportion of Used/Total Displays 
<.40 





40-.60 





Younger Workers 1 
Older Workers 13 


to stoop to this extent. The difference be- 
tween these proportions, .30, was significant 
at p= < 01. 

Controls. Taking the number of controls 
present on a machine, and expressing the 
actual number used during work as a pro- 
portion of the total, a contingency table was 
constructed, in which cell entries refer to the 
number of cases observed to fall into the cate- 
gories indicated. This is shown in Table 1. 
From these data a x” of 7.23 was calculated, 
which, with df = 2, is significant at p= < 
05. The proportion tended to be greater in 
the case of older workers. 

Displays. Taking the same ratio in the 
case of displays in use (in all cases, these 
took the form of scalar indicators), Table 2 
was constructed. In this case, y* = 9.97, 
which is significant at p= < .01. In this 
case, the proportion tended to be less in the 
case of older workers. 

Machining. The proportion of the work 
cycle spent by machinists with machines actu- 
ally running, as opposed to setting, checking 
or fitting or removing components, was calcu- 
lated. Again, a contingency table was con- 
structed from these data, and this is shown in 
Table 3. These data gave y* = 8.75, which 
is significant at p= < .02. The proportion 
tended to be greater in the older group of 
workers, implying that older workers tended 


Table 3 
Machining Time 








Proportion of Work Cycle 
Spent Machining 





<.40 40-.60 >.60 





10 
20 


Younger Workers 6 10 
Older Workers 1 7 


Younger Workers 8 4 
Older Workers 1 7 


10 
19 
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Table 4 


Proportion of Working Day Spent 
Looking at the Work 





Younger Workers 
Older Workers 


to spend less time than younger workers in 
setting and checking, etc.® 

Visual Attention. The use of this expres- 
sion is perhaps somewhat misleading. It was 
used in calculating the proportion of the time 
spent during the working day actually looking 
at some aspect of the work. Table 4 shows 
the relative proportions in the case of younger 
and older workers. In this case x? = 14.38, 
which is significant at p= < .01. The time 
spent looking at the work tended to be greater 
in the younger group. 

In addition to these features, one other 
showed a_ nonsignificant difference which 
nearly was significant at the 5% level, and 
which deserves mention in passing. The 
amount of activity of hands and arms dif- 


fered in the two groups: it was generally less 


in the older group. Whilst this result can- 
not be taken too seriously, it is interesting 
that Belbin (1) quotes a similar finding. It 
is interesting also that recent work has indi- 
cated that the monitoring of movement may 
be an activity which is a source of increasing 
difficulty with age (5, 11). 


Discussion 


The findings relating to machine controls 
and displays are extremely difficult to inter- 
pret. They may be taken to provide prima 
facie practical support of Welford’s hypothe- 
sis (10) that deterioration occurring with age 
mainly affects central organization on the re- 
ceptor side. Alternatively, a concept of “re- 


8 Discrepancies between total numbers in groups in 
the contingency tables relating to control, displays, 
and machining, are due partly to the fact that on 
certain machines no scalar indicators were present, 
and partly to variations, in one or two cases, in the 
proportion of the work cycle occupied by machining 
which could not be accurately determined. 
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dundancy” may prove useful in interpreting 
the findings, and planning clarificatory studies. 

It is quite likely that stooping and watch- 
ing occur together, and that both occur more 
frequently when machining is not in progress. 
Again, further investigation should clarify 
these issues, and one of the more important 
results of such a further investigation should 
be the identification of which of these fea- 
tures, if indeed they are related, is the most 
critical, as it is likely that two of them may 
only appear to be critical in view of their 
being always associated with the third, the 
really critical feature. At any event, these 
three features define a broad area which 
should warrant closer examination. 


Summary 


Approaches to job study preliminary to the 
modification of industrial equipment for the 
use of older workers are discussed. A study 
is described in which the work of a younger 
group and an older group of engineering work- 
ers is examined in order to identify broad 
areas in which features critical to the effective 
performance of older workers are to be found. 
The results suggest two broad areas in which 
more detailed study should be repayed, and 
in which modifications may prove effective. 
These relate to the existence of redundant 
controls and scalar indicators upon machine 
tools, and the prevalence of stooping and the 
closeness with which work has to be watched, 
in relation to certain machining activities 
other than those involved when machines are 
actually running. 


Received December 11, 1957. 
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The Effect of Display Width in Merchandising Soap 


Douglas H. Harris 


Occupational Research Center, Purdue University 


For many years, grocers have used the 
technique of increasing the shelf display 
width of canned goods to sell out more 
rapidly an inventory of slow selling merchan- 
dise. The success of this procedure would 
tend to indicate that shoppers in self-service 
stores buy on impulse and, thus, are greatly 
affected by conditions at the point of selec- 
tion. On the other hand, however, this may 
be true only when infrequently advertised 
products are involved, since in this situation 
“brand loyalty” may not become a deterring 
factor in impulse buying. 

The purpose of this experiment is to deter- 
mine whether or not buyers in a self-service 
store are influenced to buy relatively more 
products from a wider display than from a 
narrower one, when the products involved are 
well advertised. 


Method 


Selection of markets. 
lected in Lafayette, Indiana, on the basis of location. 
Supermarket I was located on the east side of the 
city in a neighborhood of lower income families. 
Supermarket II was located on the south side of the 
city in a neighborhood of higher income families. 
Supermarket III was located next to Purdue Univer- 
sity. 

Selection of soap brands. Two soap powder prod- 
ucts were selected within each market in accordance 
with the following criteria: 

1. Both products must be classified by the manu- 
facturers as heavy detergents (all-purpose deter- 
gents). 

2. Both products must be handled only in the 
large and giant sizes in the store. 

3. Both products must be packaged in a box which 
is primarily blue in color. 

4. Both products must sell for the same price. 

5. There must be no sales promotion gimmick con- 
nected with eit) + product. 

6. Both products must be well advertised, both 
nationally and locally. 

7. The sales ratio, in giant size boxes for the previ- 
ous month, of one product to the other must be 6 to 
4 in each market. There is nothing special about 
the 6 to 4 ratio except that when products were 
chosen to meet the above criteria and also to pro- 


Three supermarkets were se- 


vide the same sales ratio in each market, the 6 to 4 
ratio in each store resulted. The largest selling prod- 
uct will be called A, the lesser selling product B. 

Procedure. Three shelf-display situations were 
used in each of the three stores: 


Situation one: 3 facings of A—1 facing of B 
Situation two: 2 facings of A—2 facings of B 
Situation three: 1 facing of A—3 facings of B 


In each situation, the depth and height of each 
product display were kept equal. The two products 
were located side by side toward the middle of the 
soap section and this position remained constant 
throughout the experiment. The display width of 
the smaller size of each product was constant at 1 
facing each, located to one side of the giant size dis- 
play and with depth and height kept equal. 

The display situations were changed in each store 
after a total of 10 boxes of both products had been 
sold. Starting time for each store was on a Wed- 
nesday afternoon with the display being changed in 
the following order. 


Market Market Market 
I II Ill 
1Aand3B 2Aand2B 3Aand1B 


3Aand1B tAand3B 2Aand2B 
2Aand2B 3Aand1B_ 1Aand3B 


First Display 
Second Display 
Third Display 


Results 


The results are presented in Table 1. Chi 
square for the difference between obtained and 
expected frequencies is not statistically signifi- 


Table 1 


Number of Selections by-Supermarket Shoppers 





Store 





Combined I II It 
Product Product Product Product 


Display B A B A B A B 








Expected 12 
3A and 1B 10 
2A and 2B 8 
1A and 3B 9 
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cant at the 10% level for the combined store 
totals nor for any one store. It is recognized 
that the power of this test for the size of sam- 
ples involved is not great, but it is supported 
by the additional evidence that in not one 
store is there a trend in the direction of the 
alternate hypothesis. 

This study demonstrates, then, that increas- 
ing the relative display width of a packaged 
soap product does not increase the relative 
sales of that product. 


Douglas H. Harris 


Summary 


The relative display widths of two well-ad- 
vertised packaged soap products were varied 
in each of three supermarkets. 

The resulting selections by a total of 90 
shoppers indicated that increasing the rela- 
tive display width of a well-advertised soap 
product does not increase the choices of that 
product by self-service store shoppers. 


Received December 20, 1957. 
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Accuracy of Recall Using Keyset and Telephone Dial, and 
the Effect of a Prefix Digit’ 


R. Conrad 
Applied Psychology Research Unit, Cambridge, England 


A topic of current interest to telephone en- 
gineers is concerned with the relative merits 
of conventional telephone dials and a decimal 
set of keys (keysender) for transmitting tele- 
phone numbers. One advantage of the key- 
sender is that the telephone operator is not 
delayed by the interdigital pauses that occur 
with dials, so that speed of sending may be 
greatly increased. The two methods are thus 
distinguished by the fact that in dialling, the 
operator is partly paced because the upper 
speed limit is set by the design of the instru- 
ment. 

A recent study (6) using eight-digit mes- 
sages has shown paced recall to be inferior to 
unpaced recall. This finding would have little 


significance in the problem of the relative 


merits of dial and keysender so long as the 
length of telephone numbers was well within 
the normal span of immediate memory, but 
with the use of long digit sequences in na- 
tional trunk numbering systems (up to 10 
digits in Great Britain), errors of memory are 
liable to occur. Although the conditions of 
paced recall in the experiment referred to 
above were considerably more constrained 
than is usual in dialling telephone numbers, 
it seemed justifiable to determine whether the 
results would generalize to a more realistic 
field situation. 

A feature of many trunk numbering sys- 
tems is that all trunk numbers—as contrasted 
with local area numbers—are prefixed by a 
digit which is always the same and which acts 
as a switch. In Britain the digit 0 is used. 
Within the appropriate class of numbers, this 
digit is redundant. Nevertheless, for any one 
of several reasons one might predict that its 
presence would increase the probability of 


1 The author wishes to thank the British General 
Post Office and the Union of Post Office Workers for 
providing the facilities for this experiment. M. Stone 
advised on statistical treatment, and Barbara A. Hille 
carried out the tests. The work was supported by 
the British Medical Research Council. 


memory failure. In spite of its redundancy, 
for instance, the prefix might be treated as an 
extra digit, which if it occurred in a critical 
region of the immediate memory span would 
lead to increased error. Or by merely delay- 
ing the transmission of the succeeding digits, 
the processes of decay of memory might be 
hastened. A second aim of the experiment to 
be reported therefore was to test the hypothe- 
sis that the use of a redundant prefix digit 
would have no effect on memory errors. 


Method 


Apparatus. Two types of instrument were used. 
The keysender was composed of two horizontal rows 
of circular keys numbered 1-5 and 6-0, from left to 
right. These were mounted as a normal part of a 
telephone operator’s training position. Pressing a 
key was registered as an illuminated number on a 
panel located some distance away and behind S. 
When an operator keyed out a sequence of digits, 
the entire sequence remained illuminated for some 
10 sec., enabling E to record the order in which keys 
were pressed. S indicated the end of a sequence by 
pressing a key marked FIN. 

The second instrument was a conventional British 
G.P.O. dial telephone mounted in front of S on the 
same training position as the keysender. The se- 
quence of digits dialled was automatically recorded 
in clear print on a Zoller Recorder situated in an- 
other room. 

Digit messages were recorded on a Ferrograph tape 
recorder at a rate of 100/min., with an interval be- 
tween messages for S to respond. The output of the 
recorder was fed into a pair of headsets, one worn 
by S and one by E. The S and E could also talk to 
each other through the same headsets. 

Test material. Throughout, eight-digit messages 
were used, the digits being drawn from a decimal 
vocabulary. The messages were carefully constructed 
in such a way that each digit occurred an equal 
number of times in each serial position, obviously 
easy phrases being avoided. Chi-squared tests at 
the end of the experiment showed that there were 
no significant differences among the messages in 
terms of frequency of correct recall. Eighty mes- 
sages were constructed and arranged in four lists 
each of 20 messages. 

Subjects. The Ss were 24 female Post Office tele- 
phonists, who had the special merit of being thor- 
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oughly experienced in the use both of the keysender 
and the dial. The keysender was normal equipment 
in everyday use, whilst the dial was sometimes used 
during work, and always used out of working time. 
No special training was therefore necessary. All Ss 
were volunteers, and the testing was carried out dur- 
ing working hours. 

Design. There were four experimental conditions 
designated as follows: 


Keysender used with prefix digit 0 
Dial used with prefix digit 0 
Dial 


These four conditions were randomized in six differ- 
ent 4 X 4 latin squares, and each S was tested under 
each condition with a different list of messages. 

Procedure. In the K and D conditions, Ss were 
merely told to listen to the messages and when each 
had ended to key or dial what they had heard. Be- 
fore the KO and DO conditions, the same instruction 
was given, but Ss were told that all messages were 
to be prefixed by the digit O which was not on the 
tape. Messages beginning with 0 were to be treated 
in the same way. Since Ss were familiar with the 
use of prefix digits, no difficulty was encountered in 
giving these instructions. Each condition required 
about 10 minutes and Ss did two conditions on one 
day and two on the next day. 


Results 


A message was scored as correctly repro- 
duced only when all digits were given in the 
correct order. The mean scores of each of 
the six latin squares are given in Table 1, 
from which it will be seen that the differences 
between conditions are in the predicted direc- 
tion. Since the scores of individuals for each 
condition are out of 20, they were normalized 
by making the refined angular transformation 
due to Anscombe (1). Analysis of variance 


Table 1 
No. of Correct Messages (Max. 








K KO 





Mean of Square 
11.50 


11.50 
14.00 


9.00 
9.00 
12.00 
9.75 


9.04 





R. Conrad 


Table 2 
Analysis of Variance for Number of Correct Messages 








Mean 
Square F P 


Source of Variation _ ~—_ df 





Squares 5 
Subjects within 

Squares 18 
Test Order 3 
Conditions 3 
Order X Squares 15 
Conditions Squares 
Residual 36 


2,446.00 


28,651.29 
1,105.57 
3,018.28 

197.31 
215.07 
314.12 


Total 95 





Note.—The error variances of the six latin squares were 
tested and found to be homogeneous before pooling in the 
above table. 


was then carried out, the results of which are 
summarized in Table 2. 

Of major relevance is the variance between 
conditions which shows differences significant 
at better than the .001 probability level. The 
differences between pairs of conditions were 
tested by Duncan’s multiple range test (8) 
used at the .05 level with the following re- 
sults: 

Dial versus keysender. This result is not 
conclusively shown. , The keysender is not 
significantly better than the dial in the sim- 
ple eight-digit condition. When the stress of 
a prefix digit is added (KO v. DO), a clear 
advantage for the keysender, which is signifi- 
cantly better, is seen. The nature of this 
stress will be referred to later. 

Effect of prefix digit. Adding a prefix digit 
results in significantly fewer correct messages 
whether the keysender or dial is ised. A 
glance at Table 1 shows how consistent and 
pronounced this effect is. 

Digit confusion. Errors occurring in S’s 
reproduction of a message can be classified 
into two broad groups: first, order errors, 
when S has clearly transposed usually ‘two, 
but sometimes more, digits. The correct 
digits are given in the wrong order in such 
a way as to suggest that S is not merely 
guessing; e.g., the message 80274163 will be 
reproduced as 80271463. The second group 
comprises all other errors of which three kinds 
are likely: (a) S forgets one or more digits 





Accuracy of Recall 


and leaves blanks, (6) S forgets one or more 
digits and guesses, (c) S consistently con- 
fuses certain pairs of digits. In most cases, 
omissions are obvious, and guesses are equiva- 
lent to omissions. But there is the problem 
of distinguishing guesses from genuine con- 
fusions. It can be assumed that if S guesses, 
he will choose all possible digits with equal 
probability. Then a digit which occurs in the 
place of another digit more often than would 
be expected by chance can be regarded as 
genuinely confused. In the present experi- 
ment, this analysis was simplified because 
each digit occurred with equal frequency in 
the test messages. 

The data from all four conditions were 
pooled, and a confusion matrix set up. The 
expected value in each cell was one ninth of 
the total number of times each digit was 
wrong, since each digit could be confused with 
any of nine others. The difference between 
observed and expected distribution of errors 
amongst the nine possible digits was sepa- 
rately calculated for each of the 10 digits 
used. Yates’ corrected chi squared was used 
to test these differences and in only one case 
were the two distributions significantly differ- 
ent (.05 level). The digit 2 was called 3 
more often than would be expected by chance. 
But 3 was not called 2 beyond chance fre- 
quency. The only special feature of this con- 
fusion is the contiguity of the digits in the 
numerical scale and on the layout of key- 
sender and dial. Since no other contiguous 
pairs were confused, this particular relation- 
ship appears to be of no special significance. 
All other apparent confusions must be re- 
garded as guesses, i.e., completely forgotten. 


Discussion 


The effect of prefixing a message with a 
redundant digit is clear. It will be seen from 
Table 1 that in the case of both keysender 
and dial none of the six squares shows a 
counter effect. The simplest explanation 
might be that Ss treat the prefix as an extra 
digit making the message effectively one of 
nine digits. In general, this is certainly not 
true, since the proportion of errors at each 
serial position is the same in both prefix and 
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nonprefix conditions. If the prefix digit were 
regarded as conveying as much information 
as the others, it would itself be subject to 
some error and the first digit of the message 
proper would show as much error as the sec- 
ond digit in the nonprefix condition. Neither 
of these two effects occur. It seems fairly 
certain that all Ss treat the prefix 0 as being 
redundant. Giving due regard to the differ- 
ence between conditions, the proportion of 
correct messages recorded are compatible with 
those from a similar group of Ss for eight- 
digit messages, and incompatible with previ- 
ously reported scores for nine-digit messages 
(5). 

A second possible explanation might be 
along the lines of decay theory suggested by 
Brown (4), Broadbent (3) and others. On 
this view, the longer the interval between 
presentation and recall of a digit, the greater 
is the chance of forgetting. This explanation 
would satisfy the data for the dialling condi- 
tions. Dialling the digit 0 interposes a rela- 


tively long delay before the required mes- 
sage can be recalled. But this cannot so 
justifiably be claimed for the keysender. De- 
lay is indeed present, but it is very short; yet 


the increase in error is almost as large as it is 
in the case of dialling. 

It may be that merely remembering the 
prefix diminishes ability to recall the message, 
and this could occur if it interfered with im- 
mediate postpresentation rehearsal. Although 
the explanation of this effect is uncertain, 
some of the possibilities discussed could easily 
be tested. 

The advantage of keysender over dial has 
only partly been shown. But the effect which 
shows when the prefix is used, is so pro- 
nounced that there can be little doubt about 
it. The extra stress of the prefix not only 
worsens performance, but also differentiates 
between keysender and dial. It would be 
tempting to think that had nine-digit mes- 
sages been used, the expected effect would 
have been more clearly demonstrated. It will 
be recalled that the predicted effect was based 
on the results of a previous experiment em- 
ploying paced recall (4). In the present ex- 
periment there was the important difference 
that S was free to rehearse the message be- 





288 


tween presentation and recall. That this dif- 
ference is important is evident from the dif- 
ference in performance level between the two 
comparable groups of Ss. In the present ex- 
periment, the nonprefix paced recall (dial) 
condition yields about 50% correct messages. 
In the earlier study without rehearsal, the fig- 
ure is about 35%. In summary, it seems rea- 
sonable to conclude that if the difficulty of 
recall is such that less than half the messages 
are correct, then the keysender will show a 
significant advantage over the dial. 

The analysis of digit confusions indicates 
that there is no feature of the immediate 
memory function which could lead to system- 
atic confusions of one digit with another for 
whatever reason. It appears that if con- 
fusions occur, they must be ascribed to weak- 
ness elsewhere in the communication system. 
Indeed it has been shown (7) that when 
spoken digits are automatically recognized by 
a machine, if the digit 2 is confused, it is 
fairly likely to be called 3. In fact only two 
kinds of error in immediate memory for digits 
occur, and these are order errors and omis- 
sions. The systematic changes in material 
that are characteristic of long term memory 
(2) do not seem to appear. 


Summary 


A test of immediate memory for eight-digit 
messages was given to 24 female telephone 
operators, using four different recall condi- 
tions. It was found that the presence of a 
redundant prefix significantly worsened recall. 
When the message was transcribed onto a 10- 


R. Conrad 


digit keysender, recall was not significantly 
better than when transcribed onto a telephone 
dial. But when a prefix digit was introduced, 
the dial proved to be an inferior method of 
transcription. It would seem that at about 
the level of difficulty when more than half the 
messages would be forgotten, recall would be 
improved by use of keysender rather than 
telephone dial. 

Recall errors were analyzed digit by digit. 
All errors could be classified into order errors 
and omissions. No evidence was found that 
certain digits would be systematically con- 
fused with certain others. 
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AVA as a Predictor of Occupational Hierarchy * 
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The Activity Vector Analysis (AVA) is a 
self concept instrument (4) measuring human 
temperament. It has been widely used in in- 
dustry as a tool for the classification and se- 
lection of personnel both in the management 
and worker hierarchies (2, 3, 5). AVA is 
based upon the fundamental premise that 
over and above basic aptitude and ability 
factors, successful performance in any job is 
largely a function of personal temperament 
and behavior. It is founded on a theory of 
personality (1) which postulates that all hu- 
man behavior can adequately be described in 
terms of four areas: aggressiveness, sociabil- 
ity, emotional stability, and social adapta- 
bility 

AVA is a list of 81 nonderogatory words 
which may be used in describing human be- 
havior. The testee is required to first check 
those words which have ever been used by 
anyone in describing him (Column 1) and 
then to go back and check those words which 
he honestly believes to be descriptive of him- 
self (Column 2). Scores on this instrument 
are reported on a standard scale with X = 50, 
o@=10. Ordinary standard scores are ob- 
tained for each of four vectors representing 
a clustering of the words checked and indi- 
cating the following behaviors: 


V-1 Positive, approach behavior in an an- 
tagonistic situation, real or imaginary 
Positive, approach behavior in a 
friendly situation, real or imaginary 
Negative, withdrawal behavior in a 
friendly situation, real or imaginary 
Negative, withdrawal behavior in an 
antagonistic situation, real or imagi- 
nary 


This study was designed to determine the 
extent to which AVA can measure the tem- 
peramental characteristics purported to dis- 


V-2 
V-3 


V-4 


1 The principal contents of this article were pre- 
sented at the annual meeting of the Southeastern 
Psychological Association, Atlanta, Georgia, April 28, 
1958. 


tinguish between male members of the mana- 
gerial-supervisory occupational level and those 
of the routine operation worker level. 


Procedure 


Concurrent Sample 


The employees of a large industrial concern were 
divided into two occupational categories: higher and 
lower. All Ss included in this sample were males 
and had attained their individual occupational status 
prior to taking the AVA. No one was chosen as an 
S who was selected, transferred, or promoted to one 
of the occupations included in the a priori chosen 
categories as a result of the AVA. The higher level 
class consisted of executives, managers, and other 
management level supervisors. The lower level class 
consisted of mechanics, machinists, machine opera- 
tors, draftsmen, maintenance men, and laborers. No 
professional employees such as engineers and lawyers 
who were employed by this company were included 
in the sample. Only those occupations were studied 
in which progress from the worker to the managerial 
levels would be possible regardless of degree of for- 
mal education or training of the incumbent. Also 
those occupations which had been included in an 
earlier study (mainly those in the sales-clerical and 
general office fields) were excluded from the sam- 
ples drawn for this study. 

The median age of the members of the higher 
group was 37 with a range of 23 to 54. The median 
age of the members of the lower group was 35 with 
a range of 16 to 57. Hence, the two groups appear 
to be quite well matched with respect to age. 

The Ns for the samples of this study were 47 for 
the higher category and 112 for the lower category. 
An average resultant (Column 1 plus Column 2) pat- 
tern based on the four vector scores was obtained 
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Fic. 1. Average AVA resultant patterns for higher 


and lower occupational groups. 
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Table 1 
Distribution and Serial Correlation Statistics for AVA Resultant Vector Scores 








Higher Lower 
(N = 47) (N = 112} 


Both 
(N = 159) 





Variate x Cz xX Cz 





V-1 
V-2 
V-3 
vV-4 


+451 
+5.32 
—5.15 
—5.45 


7.92 
7.22 
5.57 
5.32 


—0.46 
—1.15 
+1.98 
—1.01 


6.96 
5.52 
7.07 
4.68 


Oz > t 





4.95 
7.76 
7.76 
6.62 


7.60 
6.75 
7.41 
5.28 


+.367 
+.526 
—.526 
— 467 


<.001 
<.001 
<.001 





Note.—These statistics are provided for the interest of the reader only. They were not directly employed in the analysis 
of the separation between the two groups of this study since AVA interpretation is made only on the basis of total pattern 


integration. 


for the higher occupational category (hereafter re- 
ferred to as Group A) and the lower category (here- 
after referred to as Group B). These patterns are 
shown in Figure 1. Accompanying statistical data 
for these average profiles are presented in Table 1. 

The standard scores for the individual vectors were 
transformed to deviation scores from the composite 
mean in order to remove the effect of activity level 
(total number of words checked). Hence, the pro- 
files constituting the set of discriminant variates in 
this study were expressed as sets of deviations about 
the individual S’s mean. The constant 25 was added 
to each deviation score so as to facilitate calcula- 
tions by removing all negative signs. 

A Fisher Two-Group Discriminant Analysis (6) 
was applied to these data, and the discriminant func- 
tion resulting from this analysis of maximum sepa- 
ration was tested for statistical significance. 

Centour Scores (7) were also derived for the pur- 
pose of establishing confidence values to be used in 
the predictions of class membership for an independ- 
ent sample. 


Prediction Sample 


A second sample (N = 76) was drawn at random 
from a population of male employees similar to those 
of the original sample. Appropriate discriminant 
weights were applied to the vector scores and a dis- 
criminant score was derived for each member of this 
sample. Then employing the Centour Score distribu- 
tion as a basis for prediction, the class membership 
of each of the 76 Ss was determined. Comparisons 
between predicted and actual class membership were 
then made and the differences were tested for signifi- 
cance by the x’ test. 

A further test was made of the power of AVA in 
determining occupational differentiation. A trained 
AVA Analyst 2 was shown the above two average 
patterns and was then asked to classify each of the 
76 Ss into either Class A or B on the basis of 
the S’s own individual resultant pattern. Using only 

2Person trained by Walter V. Clarke to apply 


AVA theory in the administration and interpretation 
of the instrument. 


his understanding of AVA theory and his knowl- 
edge of the location of 258 pattern shapes on a 
global universe the prediction was made. He was 
told how many of the Ss were in Group A and 
Group B, but he was completely unaware of the 
actual job status of the Ss he was asked to classify. 
The predictions were made solely on the basis of 
correlations between individual profiles and the two 
reference patterns (Fig. 1). These data were ob- 
tained from a table of correlation coefficients with 
which he was provided. Comparisons between pre- 
dicted and actual class membership were then made 
as in the previous step. 


Results and Discussion 
Concurrent Sample 


Discriminant analysis applied to the prob- 
lem of distinguishing between the two classes 
of the original sample on the basis of AVA 
resultant patterns produced the discriminant 
weights reported in Table 2. 


Table 2 


Discriminant Weights for Vector Scores of 
Two Classes 


Mean 
Difference 
(Class A— Class B) 


Discrimi- 
nant 


Variate Weight 





Aggressiveness 
(V-1) 

Sociability 
(V-2) 

Emotional Stability 
(V-3) 

Social Adaptability 
(V-4) 


4.96600 


6.47094 


—7.13108 


— 4.43788 
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Table 3 


Analysis of Maximum Separation Between 
Two Groups 


Mean 


Square 


001158 
.000076 


Sum of 
Squares 


Source of 
Variation df 


.004635 
011834 
016469 F=15,2368 


Function 4 
Within 154 


Total 158 


The data reveal that as a group the mem- 
bers of the upper occupational class appear to 
be more aggressive and socially confident, 
while those of the lower stratum appear to be 
more placid and submissive. This pattern 
differentiation is consistent with AVA theory 
which states that leadership qualities as indi- 
cated by outgoing, self-initiating behavior are 
necessary requisites of successful performance 
in managerial-supervisory positions whereas 
tendencies toward more relaxed and depend- 
ent behavior are important at the routine- 
operational level to insure high quality and 
quantity of production. 

The analysis of variance test applied to the 
analysis of maximum separation between the 
two groups yielded an F value of 15.2368. 
These data are presented in Table 3. 

For the 4 and 154 degrees of freedom upon 
which this statistic is based, statistical sig- 
nificance is indicated beyond the .001 level. 
Hence, there is ample evidence that the AVA 
patterns differ between members of higher and 
lower occupational classes in this industrial 
population. 

In developing a guide to use in the classifi- 
cation of the members of the prediction sam- 
ples on the basis of AVA vector scores, Cen- 
tour scores were derived from the discriminant 
score distribution for these samples. 

In deriving the individual discriminant 
scores on which the Centour scores are based, 
the discriminant weights were applied to the 
appropriate vector scores of the individual Ss. 
Then since differentiation is independent of 
the units used, the discriminant scores were 
transformed to a new scale (1,000A + 50) in 
order to remove the decimals and the nega- 
tive signs. It was found that the discriminant 
scores with the greatest predictive value for 
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the highest occupational category were those 
above 36. For the lower occupational group 
the discriminant scores with the highest pre- 
dictive value were those below 36. The score 
of 36 appeared to have little predictive value 
for either group since the probabilities of cor- 
rect classification were nearly equal for both. 
For Group A, the mean discriminant score 
was 42; for Group B, the mean was 30. 


Prediction Sample 


Discriminant scores were calculated for 
each of the 76 Ss in this sample employing 
the discriminant weights derived from the 
data of the concurrent sample. Then using 
the Centour score data, the Ss were classified 
into one of the two occupational classes. In 
this classification process, the discriminant 
score of 36 was used as the Group A cutoff 
and 35 was used for the Group B cutoff. The 
score of 36 was used as the Group A cutoff 
because there was a slightly greater chance 
of correct classification in this category than 
for Group B. The results of this classifica- 
tion procedure are summarized in Table 4. 

Out of a total sample of 76 Ss, 62 were 
predicted correctly as to hierarchial member- 
ship. The x’ value for the data of this four- 
fold table is 27.34 and is significant beyond 
the .001 level of significance. 

The results of the AVA Analyst proved to 
be equally good in predicting thé dichotomy. 
Because he judged the shapes of 6 patterns to 
be invalid the total sample was retluced to 70 
for his predictions. The profiles of the 6 in- 
dividuals were nearly straight vertical lines so 
that no vector emphasis was indicated. It is 
interesting to note, however, that all were 
members of the lower category and this find- 


Table 4 


Predicted vs. Actual Occupational Classifications 
by Experimenters 


Predicted 
B A 
Actual 


A - 34 
B 3 s 42 





Total 


76 
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Table 5 
Predicted vs. Actual Occupational Classifications 


by AVA Analyst 
ane 
Actual 


A 8 
B 32 


Total 40 














ing is consistent with that of an earlier study 
in which an identical number of nonclassifi- 
able patterns was obtained and they also were 
all from the lower group. The results of these 
predictions are reported in Table 5. 

The reason why the AVA Analyst did not 
attempt to classify these 6 Ss is that in the 
normal processing of AVA results such pat- 
terns are not interpreted. However, referring 
to Figure 1, there appears to be a sound ba- 
sis for the classification of these Ss into Cate- 
gory B where the average profile is much less 
extended as compared to the same for Cate- 
gory A. Out of a total of 70 Ss, 54 were pre- 
dicted correctly as to class membership. The 


x’? value for the data of this table is 17.79 
and is significant beyond the .001 level of 
significance. Of 70 common predictions made 
by the experimenter and the AVA Analyst, 66 
proved to be identical. 


Summary and Conclusions 


Two different experimental approaches were 
studied with reference to the measurement of 
differences in temperament possessed by male 
members of higher and lower occupational 
classes in an industrial population, and in 
using this information to predict class mem- 
bership. One involved a rigorous statistical 
analysis technique; the other, a rather sim- 
ple and unsophisticated procedure. Both 
methods proved to be highly successful in 
this respect and have attested to the power 
of AVA in the hierarchial classification of 


nonprofessional male employees on the basis 
of temperament characteristics. The findings 
of this study confirm the existence of differ- 
ences in temperament characteristics of per- 
sonnel of higher and lower echelons of em- 
ployment which were found in an earlier 
study based on mixed-sex samples drawn from 
a business population. These findings further 
confirm the power and efficiency of AVA in 
measuring these differences, and in providing 
the proper classification of personnel accord- 
ingly. They also suggest temperament cri- 
teria to be evaluated in personnei selection 
and assignment, and when considering the 
promotion of nonprofessional male employees 
to supervisory and managerial levels. 

The following conclusions are held to be 
tenable from the data of this study: 

1. Differences in temperament character- 
istics exist between employees of higher and 


‘lower echelons. 


2. AVA can be efficiently used in the hier- 
archial classification of male industrial em- 
ployees. 
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